weighted finite-state transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a...

68
University of Tokyo Minematsu Lab Josef R. Novak Weighted Finite-State Transducers Important Algorithms 0 1 a/0 b/1 c/5 2 d/0 e/1 3 e/0 f/1 e/4 f/5 (a)

Upload: others

Post on 05-Jul-2020

12 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

University of Tokyo Minematsu Lab Josef R. Novak

Weighted Finite-State Transducers Important Algorithms

30 Mehryar Mohri

0

1a/0

b/1

c/5

2

d/0

e/1

3

e/0f/1

e/4

f/5

0/0

1a/0

b/1

c/5

2

d/4e/5

3/0

e/0f/1

e/0

f/1

0/15

1

a/0

b/(1/15)

c/(5/15)

2

d/0

e/(9/15)

3/1

e/0f/1

e/(4/9)

f/(5/9) 0/0 1a/0b/1c/5

3/0e/0f/1

(a) (b) (c) (d)

Fig. 12. Weight pushing algorithm. (a) Weighted automaton A. (b) Equivalentweighted automaton B obtained by weight pushing in the tropical semiring. (c)Weighted automaton C obtained from A by weight pushing in the probability semir-ing. (d) Minimal weighted automaton over the tropical semiring equivalent to A.

large graphs or automata of several hundred million states or transitions.An approximate version of a generic shortest-distance algorithm can be usedinstead to compute d[q] e!ciently.

Note that if d[q] = 0, then, since S is zero-sum-free, the weight of allpaths from q to F is 0. Let A be a weighted automaton over the semiringS. Assume that S is complete or k-closed and that the shortest-distancesd[q] are all well-defined and in S ! {0}. Note that in both cases we can usethe distributivity over the infinite sums defining shortest distances. Let e!

(!!) denote the transition e (path !) after application of the weight pushingalgorithm. e! (!!) di"ers from e (resp. !) only by its weight. Let "! denote thenew initial weight function, and #! the new final weight function.

The following proposition is proven in [38, 43].

Proposition 7 ([38, 43]). Let B = (A, Q, I, F, E!, "!, #!) be the result of theweight pushing algorithm applied to the weighted automaton A, then

1. the weight of a successful path ! is unchanged after application of weightpushing:

"![p[!!]] " w[!!] " #![n[!!]] = "(p[!]) " w[!] " #(n[!]). (39)

2. the weighted automaton B is stochastic, i.e.

#q $ Q,!

e!"E![q]

w[e!] = 1. (40)

These two properties of weight pushing are illustrated by Figures 12(a)-(c):the total weight of a successful path is unchanged after pushing; at eachstate of the weighted automaton of Figure 12(b), the minimum weight of theoutgoing transitions is 0, and at at each state of the weighted automaton ofFigure 12(c), the weights of outgoing transitions sum to 1.

Weight pushing can also be used to test the equivalence of two subsequen-tial weighted automata [38, 43]. Let A and B be two subsequential weightedautomata to which weight pushing can be applied and let A! and B! be the

Page 2: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Lecture Outline  Preliminaries  Semirings  Transducers and Acceptors

 Important Algorithms  Composition  Determinization  Epsilon Removal  Other algorithms

2

Page 3: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Semirings (I)   WFSTs and WFST-based operations are underpinned by

algebraic objects called semirings

 This has implications for optimization, search, and combination algorithms such as determinization, shortest-path, and composition

Definition A semiring is a system (K,⊕,⊗, 0̄, 1̄) such that,(1) (K,⊕, 0̄) is a commutative monoid with 0̄ as the identity

element for ⊕,(2) (K,⊗, 1̄) is a monoid with 1̄ as the idnetity element for ⊗,(3) ⊕ distributes over ⊕: for all a, b, c in K,

(a⊕ b)⊗ c = (a⊗ c)⊕ (b⊗ c),c⊗ (a⊕ b) = (c⊗ a)⊕ (c⊗ b),

(4) 0̄ is an annihilator for ⊗ : ∀a ∈ K, a⊗ 0̄ = 0̄⊗ a = 0̄

1

3 A monoid is an algebraic structure that supports a single associative binary operation and an identity element. (wikipedia)

Speech Recognition with Weighted Finite-State Transducers, Mohri et. al

Page 4: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Semirings (II)   A variety of semirings exist, but there are two that are of

particular interest for NLP and ASR applications   The log semiring

  Isomorphic to the real semiring, high numerical stability

  The tropical semiring   Convenient for shortest-path algorithms because global path weights are

guaranteed to be monotonic non-decreasing, viterbi approximation is fast

  Semiring frameworks and algorithms for shortest-distance problems Mohri, 2002, for more details

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

4

Speech Recognition with Weighted Finite-State Transducers, Mohri et. al

Page 5: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Semirings (example) 5

  Tropical Semiring example   Simple, but unintuitive!

5⊕ 3 = 34⊕2⊕ 7 = 2(3⊗ 4)⊕6 = 61⊗5 = 6(1⊗5)⊕ (2⊗ 4) = 63⊕ 1 ⇔ 3⊕0 ⇔ 03⊗ 1 ⇔ 3⊗0 ⇔ 3

a⊗b = a + ba⊕b = min(a,b)1 = 00 = +∞

Operation Definitions Trivial Examples

Page 6: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Transducers and Acceptors

  A WFSA is simply a WFST where the output labels have been omitted

  Similarly, FSAs and FSTs lack weights on the arcs or states

Definition A WFST is defined as an 8-tuple, T = (Σ,�, Q, I, F,E,λ, ρ).Here Σ represents the finite alphabet of input symbols, � represents

the finite output alphabet, Q represents the finite set of states, I ⊆ Qthe set of initial states, F ⊆ Q the set of final states, E ⊆ Q× (Σ ∪ {�})× (�∪ {�})×K×Q a finite set of state-to-state transitions, λ : I → Kthe initial weight function, and ρ : F → K the final weight function

mapping F to K .

1

6

Speech Recognition with Weighted Finite-State Transducers, Mohri et. al

Page 7: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Basic examples   Finite-State Acceptor (FSA)

  Weighted Finite-State Acceptor (WFSA)

  Finite-State Transducer (FST)

  Weighted Finite-State Transducer (WFST)

7

Page 8: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

WFST Practice   OpenFST ( www.openfst.org )   $ emacs test.fst.txt!

  $ emacs test.syms!

  $ fstcompile --isymbols=test.syms --osymbols=test.syms test.fst.txt | fstdraw --isymbols=test.syms --osymbols=test.syms --portrait=true | dot -Tpdf > test.pdf!

  $ emacs test2.fst.txt!

  $ emacs test2.syms!

  $ fstcompile --acceptor=true --isymbols=test2.syms test2.fst.txt | fstdraw --portrait=true --acceptor=true --isymbols=test2.syms | dot -Tpdf > test2.pdf!

test.fst.txt!0 1 t test!1 2 eh -!2 3 s -!3 4 t -!4!

test.syms!- 0!test 1!t 2!eh 3!s 4!

test2.fst.txt!0 1 write .3!0 1 right .7!1!

test2.syms!-  0!right 1!write 2!

8

Page 9: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Important algorithms

  There exists a wide variety of algorithms that operate on Weighted Finite-State Transducers   composition, determinization, minimization, epsilon-removal,

epsilon-normalization, synchronization, weight-pushing, reversal, projection, shortest-path, connection, closure, concatenation, pruning, re-weighting, union, etc.

  Arguably the most important operations are composition, determinization, epsilon-removal, weight-pushing, and minimization

9

Page 10: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Algorithms 15

0 1a:b/0.1a:b/0.2

2b:b/0.33/0.7b:b/0.4

a:b/0.5

a:a/0.6

0 1b:b/0.1

b:a/0.22a:b/0.3

3/0.6a:b/0.4

b:a/0.5

(a) (b)

(0, 0) (1, 1)a:b/.01

(0, 1)a:a/.04

(2, 1)b:a/.06 (3, 1)

b:a/.08

a:a/.02

a:a/0.1

(3, 2)a:b/.18

(3, 3)/.42

a:b/.24

(c)

Fig. 6. Weighted transducers (a) T1 and (b) T2 over the probability semiring. (c)Illustration of composition of T1 and T2, T1 ! T2. Some states might be constructedduring the execution of the algorithm that are not co-accessible, e.g., (3, 2). Suchstates and the related transitions can be removed by a trimming (or connection)algorithm in linear-time.

T1 = (!!, "!, Q1, I1, F1, E1, #1, $1) and T2 = ("!, %!, Q2, I2, F2, E2, #2, $2)

such that the input alphabet of T2, ", coincides with the output alphabet ofT1, outputs a weighted transducer T = (!!, %!, Q, I, F, E, #, $) realizing thecomposition of T1 and T2.

Let T1 and T2 be two weighted transducers defined over S such that theinput alphabet of T2 coincides with the output alphabet of T1. Assume thatthe infinite sum

!

z"!! T1(x, z)! T2(z, y) is defined and in S for all (x, y) "!! # %!. This condition holds for all transducers defined over a completesemiring such as the Boolean semiring, the tropical semiring, the probabilitysemiring and the log semiring, and for all acyclic transducers defined overan arbitrary semiring. Then, the result of the composition of T1 and T2 is aweighted transducer denoted by T1 $ T2 and defined for all x, y by:

(T1 $ T2)(x, y) ="

z"!!

T1(x, z) ! T2(z, y). (23)

The sum runs over all strings z labeling a path of T1 on the output side anda path of T2 on input label z. The matrix notation we have used emphasizesthe connection of composition with matrix multiplication.9

There exists a general and e!cient algorithm to compute the compositionof two weighted transducers. In the absence of &s on the input side of T1 orthe output side of T2, the states of T1 $ T2 can be identified with pairs of a

9 Our choice of a matrix notation as opposed to a functional notation is motivatedby its convenience in applications.

Composition (I) 10

  Fundamental operation for combining related transducers,

  Used to iteratively combine multiple related knowledge sources to produce a single, integrated result

(T1 ◦ T2)(x, y) =�

z∈∆∗

T1(x, z)⊗ T2(z, y),

where T1 and T2 represent the two transducers to be composed. The composition algorithmitself then, in the epsilon-free case, is defined as follows,

Algorithm 1 Weighted-Composition(T1, T2)

1: Q ← I1 × I2

2: K ← I1 × I2

3: while K �= ∅ do4: q = (q1, q2) ← Head(K)5: Dequeue(K)6: if q ∈ I1 × I2 then7: I ← I ∪ {q}8: λ(q) ← λ1(q1)⊗ λ2(q2)9: if q ∈ F1 × F2 then

10: F ← F ∪ {q}11: p(q) ← p1(q1)⊗ p2(q2)12: for each (e1, e2) ∈ E[q1]× E[q2] such that o[e1] = i[e2] do13: if (q�) = (n[e1], n[e2] /∈ Q) then14: Q ← Q ∪ {q�}15: Enqueue(K, q

�)16: E ← E � {(q, i[e1], o[e2], w[e1]⊗ w[e2], q�)}17: return T

1

(a) (b)

(c)

Page 11: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (II) 11

(T1 ◦ T2)(x, y) =�

z∈∆∗

T1(x, z)⊗ T2(z, y),

where T1 and T2 represent the two transducers to be composed. The composition algorithmitself then, in the epsilon-free case, is defined as follows,

Algorithm 1 Weighted-Composition(T1, T2)

1: Q ← I1 × I2

2: K ← I1 × I2

3: while K �= ∅ do4: q = (q1, q2) ← Head(K)5: Dequeue(K)6: if q ∈ I1 × I2 then7: I ← I ∪ {q}8: λ(q) ← λ1(q1)⊗ λ2(q2)9: if q ∈ F1 × F2 then

10: F ← F ∪ {q}11: p(q) ← p1(q1)⊗ p2(q2)12: for each (e1, e2) ∈ E[q1]× E[q2] such that o[e1] = i[e2] do13: if (q�) = (n[e1], n[e2] /∈ Q) then14: Q ← Q ∪ {q�}15: Enqueue(K, q

�)16: E ← E � {(q, i[e1], o[e2], w[e1]⊗ w[e2], q�)}17: return T

1

Algorithm complexity

Time O(V1V2D1(logD2+M2))

Space O(V1V2V1M2) Vi=#states, Di=max out-degree, Mi=max multiplicity of ith WFST. www.OpenFST.org

Page 12: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (II) 12

(T1 ◦ T2)(x, y) =�

z∈∆∗

T1(x, z)⊗ T2(z, y),

where T1 and T2 represent the two transducers to be composed. The composition algorithmitself then, in the epsilon-free case, is defined as follows,

Algorithm 1 Weighted-Composition(T1, T2)

1: Q ← I1 × I2

2: K ← I1 × I2

3: while K �= ∅ do4: q = (q1, q2) ← Head(K)5: Dequeue(K)6: if q ∈ I1 × I2 then7: I ← I ∪ {q}8: λ(q) ← λ1(q1)⊗ λ2(q2)9: if q ∈ F1 × F2 then

10: F ← F ∪ {q}11: p(q) ← p1(q1)⊗ p2(q2)12: for each (e1, e2) ∈ E[q1]× E[q2] such that o[e1] = i[e2] do13: if (q�) = (n[e1], n[e2] /∈ Q) then14: Q ← Q ∪ {q�}15: Enqueue(K, q

�)16: E ← E � {(q, i[e1], o[e2], w[e1]⊗ w[e2], q�)}17: return T

1

Algorithm complexity

Time O(V1V2D1(logD2+M2))

Space O(V1V2V1M2) Vi=#states, Di=max out-degree, Mi=max multiplicity of ith WFST. www.OpenFST.org

Initialize the queue and new WFST

Initial state matches

Final state matches

Create new arcs for any matching labels, and add to

the new WFST

Page 13: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 13

C (0a,0b)

a:b/0.1 (1a) b:b/0.1 (1b) a:b/0.01 (1a,1b)

Queue (K)

(0a,0b)

(a) (b)

(c)

Page 14: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 14

C (0a,0b)

a:b/0.1 (1a) b:b/0.1 (1b)

a:b/0.01 (1a,1b)

Queue (K)

(0a,0b)

Queue (K+1)

(1a,1b)

(a) (b)

(c)

Page 15: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 15

C (1a,1b)

a:b/0.2 (0a) b:a/0.2 (1b)

b:b/0.3 (2a) a:b/0.3 (2b)

b:b/0.4 (3b) a:b/0.4 (3b)

a:a/0.04 (0a,1b)

b:a/0.06 (2a,1b)

b:a/0.08 (3a,1b)

Queue (K)

(1a,1b)

Queue (K+1)

(0a,1b)

(2a,1b)

(3a,1b)

(a) (b)

(c)

Page 16: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 16

C (0a,1b)

a:b/0.1 (1a) b:a/0.2 (1b)

a:b/0.3 (2b)

a:b/0.4 (3b)

a:a/0.02 (1a,1b)

Queue (K)

(0a,1b)

(2a,1b)

(3a,1b)

Queue (K+1)

(2a,1b)

(3a,1b)

(a) (b)

(c)

Page 17: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 17

C (2a,1b)

a:b/0.5 (3a) b:a/0.2 (1b)

a:b/0.3 (2b)

a:b/0.4 (3b)

a:a/0.1 (3a,1b)

Queue (K)

(2a,1b)

(3a,1b)

Queue (K+1)

(3a,1b)

(a) (b)

(c)

Page 18: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 18

C (3a,1b)

a:a/0.6 (3a) b:a/0.2 (1b)

a:b/0.3 (2b)

a:b/0.4 (3b)

a:b/0.18 (3a,2b)

a:b/0.24 (3a,3b)

Queue (K)

(3a,1b)

Queue (K+1)

(3a,2b)

(3a,3b)

(a) (b)

(c)

Page 19: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 19

C (3a,2b)

a:a/0.6 (3a) b:a/0.5 (3b)

Null

Queue (K)

(3a,2b)

(3a,3b)

Queue (K+1)

(3a,3b)

(a) (b)

(c)

Page 20: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 20

C (3a,3b)

a:a/0.6 (3a) Null

Null

Queue (K)

(3a,3b)

Queue (K+1)

Null

(a) (b)

(c)

Page 21: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) 21

(C) composition result

(0a,0b) a:b/0.01 (1a,1b) (0a,1b) a:a/0.02 (1a,1b)

(1a,1b) a:a/0.04 (0a,1b) (2a,1b) a:a/0.1 (3a,1b)

(1a,1b) b:a/0.06 (2a,1b) (3a,1b) a:b/0.18 (3a,2b)

(1a,1b) b:a/0.08 (3a,1b) (3a,1b) a:b/0.24 (3a,3b)

(c)

Page 22: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (III) 22

  -transitions (epsilons) cause problems  Wildcard (*) that matches zero or more occurrences of any

label in the input/output alphabet  Create redundant paths during simple composition  May produce incorrect results for non-idempotent semirings

as path-weights may be counted more than once

  Solution is to utilize an intermediate filter transducer to eliminate the redundant paths

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Page 23: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (IV) 23 Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Page 24: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (IV) 24 Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Valid because the output epsilon

label doesn’t need to match anything on the 2nd WFST

Valid because the input epsilon label

doesn’t need to match anything on

the 1st WFST

Page 25: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) Composition with epsilons – generic

C (0a,0b)

a:a (1a) a:d (1b)

a:d (1a,1b)

Queue (K)

(0a,0b)

Queue (K+1)

(1a,1b)

Page 26: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (0a,0b)

a:a (1a) a:d (1b)

a:d (1a,1b)

Queue (K)

(0a,0b)

Queue (K+1)

(1a,1b)

Composition (example) Composition with epsilons – generic

Page 27: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (1a,1b)

b:ε (2a) ε:e (2b)

ε (1a) ε:e (2b)

b:ε (2a) ε (1b)

ε:e (1a,2b)

b:ε (2a,1b)

b:e (2a,2b)

Queue (K)

(1a,1b)

Queue (K+1)

(1a,2b)

(2a,1b)

(2a,2b)

Composition (example) Composition with epsilons – generic

Page 28: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (1a,1b)

b:ε (2a) ε:e (2b)

ε (1a) ε:e (2b)

b:ε (2a) ε (1b)

ε:e (1a,2b)

b:ε (2a,1b)

b:e (2a,2b)

Queue (K)

(1a,1b)

Queue (K+1)

(1a,2b)

(2a,1b)

(2a,2b)

Composition (example) Composition with epsilons – generic

Epsilon labels can either generate a wildcard match, or act as a null transition. This causes the corresponding WFST to either stay in the current state or skip to the next.

Page 29: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (1a,2b)

b:ε (1a) d:a (2b)

b:ε (2a,2b)

Queue (K)

(1a,2b)

(2a,1b)

(2a,2b)

Queue (K+1)

(2a,1b)

(2a,2b)

Composition (example) Composition with epsilons – generic

Page 30: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (2a,1b)

ε (2a) ε:e (2b)

c:ε (3a) ε (1b)

ε:e (2a,2b)

c:ε (3a,1b)

Queue (K)

(2a,1b)

(2a,2b)

Queue (K+1)

(2a,2b)

(3a,1b)

Composition (example) Composition with epsilons – generic

Page 31: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (2a,2b)

c:ε (3a) ε (2b)

c:ε (3a,2b)

Queue (K)

(2a,2b)

(3a,1b)

Queue (K+1)

(3a,1b)

(3a,2b)

Composition (example) Composition with epsilons – generic

Page 32: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (3a,1b)

ε (3a) ε:e (3b)

ε:e (3a,2b)

Queue (K)

(3a,1b)

(3a,2b)

Queue (K+1)

(3a,2b)

Composition (example) Composition with epsilons – generic

Page 33: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (3a,2b)

d:d (4a) d:a (3b)

d:a (4a,3b)

Queue (K)

(3a,2b)

Queue (K+1)

(4a,3b)

Composition (example) Composition with epsilons – generic

Page 34: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

C (4a,3b)

Null Null

Null

Queue (K)

(4a,3b)

Queue (K+1)

Null

Composition (example) Composition with epsilons – generic

Page 35: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Composition (example) Composition with epsilons – generic

  Naive application of the algorithm generates many superfluous transitions as well as states  Necessary to filter the composition operation in order to prevent the

unnecessary generation of these elements  The final result should only include unique, valid paths

Page 36: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (I) 36

  Eliminates ambiguity in the input paths   Improves efficiency of downstream operations such as shortest-path   Each state has at most one outgoing transition containing any

particular input label

 Determinization of weighted automata. (a) Weighted automaton over the tropical semiring A. (a’) Equivalent weighted automaton a’ obtained by determinization of a. (Mohri, 2002)

a a’

Page 37: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (II) 37

26 Mehryar Mohri

Weighted-Determinization(A)

1 i! ! {(i, !(i)) : i " I}2 !!(i!) ! 13 Q ! {i!}4 while Q #= $ do

5 p! ! Head(Q)6 Dequeue(Q)7 for each x " i[E[Q[p!]]] do

8 w! !L

{v % w : (p, v) " p!, (p, x, w, q) " E}9 q! ! {(q,

L

{w!"1 % (v % w) : (p, v) " p!, (p, x, w, q) " E}) :q = n[e], i[e] = x, e " E[Q[p!]]}

10 E! ! E! & {(p!, x, w!, q!)}11 if q! #" Q! then

12 Q! ! Q! & {q!}13 if Q[q!] ' F #= $ then

14 F ! ! F ! & {q!}15 "!(q!) !

L

{v % "(q) : (q, v) " q!, q " F}16 Enqueue(Q, q!)17 return T !

set of transitions leaving these states, and i[E[Q[p!]]] the set of input labelsof these transitions.

The states of the output automaton can be identified with (weighted) sub-sets of the states of the original automaton. A state r of the output automatonthat can be reached from the start state by a path ! is identified with the set ofpairs (q, x) ! Q"S such that q can be reached from an initial state of the origi-nal machine by a path " with i["] = i[!] and #(p["])#w["] = #(p[!])#w[!]#x.Thus, x can be viewed as the residual weight at state q.

Determinization does not terminate for all weighted automata. As weshall see, not all weighted automata are determinizable by the algorithmjust described. When it terminates, the algorithm returns a subsequentialweighted automaton A! = ($, Q!, I !, F !, E!, #!, %!), equivalent to the inputA = ($, Q, I, F, E, #, %).

The algorithm uses a queue Q containing the set of states of the resultingautomaton A!, yet to be examined. The queue discipline of Q can be arbitrarilychosen and does not a!ect the termination of the algorithm. A! admits aunique initial state, i!, defined as the set of initial states of A augmented withtheir respective initial weights. Its input weight is 1 (lines 1-2). Q originallycontains only the subset i! (line 3). At each execution of the loop of lines 4-16,a new subset p! is extracted from Q (lines 5-6). For each x labeling at least oneof the transitions leaving a state p of the subset p!, a new transition with inputlabel x is constructed. The weight w! associated to that transition is the sumof the weights of all transitions in E[Q[p!]] labeled with x pre-#-multiplied

Algorithm complexity

Time exponential

Space exponential

Page 38: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

26 Mehryar Mohri

Weighted-Determinization(A)

1 i! ! {(i, !(i)) : i " I}2 !!(i!) ! 13 Q ! {i!}4 while Q #= $ do

5 p! ! Head(Q)6 Dequeue(Q)7 for each x " i[E[Q[p!]]] do

8 w! !L

{v % w : (p, v) " p!, (p, x, w, q) " E}9 q! ! {(q,

L

{w!"1 % (v % w) : (p, v) " p!, (p, x, w, q) " E}) :q = n[e], i[e] = x, e " E[Q[p!]]}

10 E! ! E! & {(p!, x, w!, q!)}11 if q! #" Q! then

12 Q! ! Q! & {q!}13 if Q[q!] ' F #= $ then

14 F ! ! F ! & {q!}15 "!(q!) !

L

{v % "(q) : (q, v) " q!, q " F}16 Enqueue(Q, q!)17 return T !

set of transitions leaving these states, and i[E[Q[p!]]] the set of input labelsof these transitions.

The states of the output automaton can be identified with (weighted) sub-sets of the states of the original automaton. A state r of the output automatonthat can be reached from the start state by a path ! is identified with the set ofpairs (q, x) ! Q"S such that q can be reached from an initial state of the origi-nal machine by a path " with i["] = i[!] and #(p["])#w["] = #(p[!])#w[!]#x.Thus, x can be viewed as the residual weight at state q.

Determinization does not terminate for all weighted automata. As weshall see, not all weighted automata are determinizable by the algorithmjust described. When it terminates, the algorithm returns a subsequentialweighted automaton A! = ($, Q!, I !, F !, E!, #!, %!), equivalent to the inputA = ($, Q, I, F, E, #, %).

The algorithm uses a queue Q containing the set of states of the resultingautomaton A!, yet to be examined. The queue discipline of Q can be arbitrarilychosen and does not a!ect the termination of the algorithm. A! admits aunique initial state, i!, defined as the set of initial states of A augmented withtheir respective initial weights. Its input weight is 1 (lines 1-2). Q originallycontains only the subset i! (line 3). At each execution of the loop of lines 4-16,a new subset p! is extracted from Q (lines 5-6). For each x labeling at least oneof the transitions leaving a state p of the subset p!, a new transition with inputlabel x is constructed. The weight w! associated to that transition is the sumof the weights of all transitions in E[Q[p!]] labeled with x pre-#-multiplied

Determinization (II) 38

Compute new arc weight

Compute new states and residual weights

Initialize the queue and new WFST

Compute new final weight, if necessary

Algorithm complexity

Time exponential

Space exponential

Page 39: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

26 Mehryar Mohri

Weighted-Determinization(A)

1 i! ! {(i, !(i)) : i " I}2 !!(i!) ! 13 Q ! {i!}4 while Q #= $ do

5 p! ! Head(Q)6 Dequeue(Q)7 for each x " i[E[Q[p!]]] do

8 w! !L

{v % w : (p, v) " p!, (p, x, w, q) " E}9 q! ! {(q,

L

{w!"1 % (v % w) : (p, v) " p!, (p, x, w, q) " E}) :q = n[e], i[e] = x, e " E[Q[p!]]}

10 E! ! E! & {(p!, x, w!, q!)}11 if q! #" Q! then

12 Q! ! Q! & {q!}13 if Q[q!] ' F #= $ then

14 F ! ! F ! & {q!}15 "!(q!) !

L

{v % "(q) : (q, v) " q!, q " F}16 Enqueue(Q, q!)17 return T !

set of transitions leaving these states, and i[E[Q[p!]]] the set of input labelsof these transitions.

The states of the output automaton can be identified with (weighted) sub-sets of the states of the original automaton. A state r of the output automatonthat can be reached from the start state by a path ! is identified with the set ofpairs (q, x) ! Q"S such that q can be reached from an initial state of the origi-nal machine by a path " with i["] = i[!] and #(p["])#w["] = #(p[!])#w[!]#x.Thus, x can be viewed as the residual weight at state q.

Determinization does not terminate for all weighted automata. As weshall see, not all weighted automata are determinizable by the algorithmjust described. When it terminates, the algorithm returns a subsequentialweighted automaton A! = ($, Q!, I !, F !, E!, #!, %!), equivalent to the inputA = ($, Q, I, F, E, #, %).

The algorithm uses a queue Q containing the set of states of the resultingautomaton A!, yet to be examined. The queue discipline of Q can be arbitrarilychosen and does not a!ect the termination of the algorithm. A! admits aunique initial state, i!, defined as the set of initial states of A augmented withtheir respective initial weights. Its input weight is 1 (lines 1-2). Q originallycontains only the subset i! (line 3). At each execution of the loop of lines 4-16,a new subset p! is extracted from Q (lines 5-6). For each x labeling at least oneof the transitions leaving a state p of the subset p!, a new transition with inputlabel x is constructed. The weight w! associated to that transition is the sumof the weights of all transitions in E[Q[p!]] labeled with x pre-#-multiplied

Determinization (II) 39

Compute new arc weight

Compute new states and residual weights

Initialize the queue and new WFST

Compute new final weight, if necessary

Algorithm complexity

Time exponential

Space exponential

Not the same!!

Not the same!!

Page 40: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 40

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

Recall the properties of the tropical semiring

Queue (K)

(0, )

a

a’

1

Page 41: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 41

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

Recall the properties of the tropical semiring

Queue (K)

(0, )

a

a’

1

Input state ID

Residual weight from original path

Page 42: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 42

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

Recall the properties of the tropical semiring

Queue (K)

(0,0)

ʹ′ w =⊕ 1 ⊗1( ), 1 ⊗2( ){ }ʹ′ w = min 0 +1( ), 0 + 2( ){ }ʹ′ w =1

ʹ′ q = 1,⊕ 1−1 ⊗ 1 ⊗1( ){ }( ), 2,⊕ 1−1 ⊗ 1 ⊗2( ){ }( )⎧ ⎨ ⎩

⎫ ⎬ ⎭

ʹ′ q = 1,min −1+ 0 +1( ){ }( ), 2,min −1+ 0 + 2( ){ }( ){ }ʹ′ q = 1,0( ), 2,1( ){ }

{ (0,0), a, 1, ((1,0),(2,1)) } Queue (K)

((1,0),(2,1))

a

a’

Page 43: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 43

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

Recall the properties of the tropical semiring

Queue (K)

(0,0)

ʹ′ w =⊕ 1 ⊗1( ), 1 ⊗2( ){ }ʹ′ w = min 0 +1( ), 0 + 2( ){ }ʹ′ w =1

ʹ′ q = 1,⊕ 1−1 ⊗ 1 ⊗1( ){ }( ), 2,⊕ 1−1 ⊗ 1 ⊗2( ){ }( )⎧ ⎨ ⎩

⎫ ⎬ ⎭

ʹ′ q = 1,min −1+ 0 +1( ){ }( ), 2,min −1+ 0 + 2( ){ }( ){ }ʹ′ q = 1,0( ), 2,1( ){ }

{ (0,0), a, 1, ((1,0),(2,1)) } Queue (K)

((1,0),(2,1))

a

a’

注意!

Page 44: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 44

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

Recall the properties of the tropical semiring

Queue (K)

((1,0),(2,1))

ʹ′ w =⊕ 0⊗ 3( ), 1⊗ 3( ){ }ʹ′ w =min 0 + 3( ), 1+ 3( ){ }ʹ′ w = 3

ʹ′ q = 1,⊕ 3−1⊗ 0⊗ 3( ){ }( ), 2,⊕ 3−1⊗ 1⊗ 3( ){ }( )⎧ ⎨ ⎩

⎫ ⎬ ⎭

ʹ′ q = 1,min −3+ 0 + 3( ){ }( ), 2,min −3+ 1+ 3( ){ }( ){ }ʹ′ q = 1,0( ), 2,1( ){ }

{ ((1,0),(2,1)), b, 3, ((1,0),(2,1)) } Queue (K)

a

a’

Page 45: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 45

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

Recall the properties of the tropical semiring

Queue (K)

((1,0),(2,1))

ʹ′ w =⊕ 0⊗5( ){ }ʹ′ w =min 0 + 5( ){ }ʹ′ w = 5

ʹ′ q = 3,⊕ 5−1⊗ 0⊗5( ){ }( )⎧ ⎨ ⎩

⎫ ⎬ ⎭

ʹ′ q = 3,min −5 + 0 + 5( ){ }( ){ }ʹ′ q = 3,0( ){ }

{ ((1,0),(2,1)), c, 5, (3,0) } Queue (K)

(3,0)

a

a’

Page 46: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 46

a

a’

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

Recall the properties of the tropical semiring

Queue (K)

((1,0),(2,1))

ʹ′ w =⊕ 1⊗6( ){ }ʹ′ w =min 1+ 6( ){ }ʹ′ w = 7

ʹ′ q = 3,⊕ 7−1⊗ 1⊗6( ){ }( )⎧ ⎨ ⎩

⎫ ⎬ ⎭

ʹ′ q = 3,min −7 + 1+ 6( ){ }( ){ }ʹ′ q = 3,0( ){ }

{ ((1,0),(2,1)), d, 7, (3,0) } Queue (K)

(3,0)

Page 47: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 47

a

a’

Null

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

SEMIRING SET ⊕ ⊗ 0̄ 1̄

Boolean {0, 1} ∨ ∧ 0 1Probability R+ ∪ {+∞} + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R+ ∪ {+∞} min + +∞ 0

Table 1. Semiring examples. ⊕log is defined by: x⊕log y = −log(e−x + e−y).

1

Recall the properties of the tropical semiring

Queue (K)

(3,0)

Queue (K)

Null

Page 48: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (example) 48

a’

a’ determinization result

(0,0) a/1 ((1,0),(2,1))

((1,0),(2,1)) b/3 ((1,0),(2,1))

((1,0),(2,1)) c/5 (2a,1b)

((1,0),(2,1)) d/7 (3a,1b)

Page 49: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (III) 49

  Some issues with weighted determinization  May not terminate for some inputs!  May not finish for some inputs, even if they are valid!  May blow up in memory

  The Twins property can be utilized to test for determinizability

This WFST cannot be determinized, but why?

w P q,y,q( )( ) = w P ʹ′ q ,y, ʹ′ q ( )( )

Page 50: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Determinization (III) 50

  Some issues with weighted determinization  May not terminate for some inputs!  May not finish for some inputs, even if they are valid!  May blow up in memory

  The Twins property can be utilized to test for determinizability

This WFST cannot be determinized, but why?

The residual weights generated by these

loops cannot be normalized

w P q,y,q( )( ) = w P ʹ′ q ,y, ʹ′ q ( )( ) “Weighted Automata Algorithms”, Mehryar Mohri, 2002 for details

Page 51: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (I) 51

  Removes -transitions from the input transducer and produces a new, epsilon-free transducer equivalent to the original

  Preferable to remove -transitions prior to performing downstream processing or optimization   Introduce delays  Cause problems or complications

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

A

Page 52: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (II) 52

  -closure computation of state p:

22 Mehryar Mohri

Epsilon-Removal(T )

1 for each p ! Q do

2 Compute-Closure(C(p))3 E! " E! " {(p, a, b, w, q) ! E : (a, b) #= (!, !)}4 F ! " F5 "! " "6 for each p ! Q do

7 for each (q, w!) ! C[p] do

8 E![p] " E![p] $ {(p, a, b, w! % w, r) : (q, a, b, w, r) ! E!}9 if q ! F then

10 if p #! F then

11 F ! " F & {p}12 "![p] " 013 "![p] " "![p] ' (w! % "(q))14 return T ! = (#, $, Q, I, F !, E!, %, "!)

The proof is simple and is given in [40].The !-closures C(p) can be computed using an all-pair shortest-distance

algorithm over T! when the semiring S is complete, or by applying the single-shortest distance algorithm from each source p when the semiring is k-closed,as described in Section 2. The complexity of the second stage of the algorithm(lines 6-13) is in O(|Q|2 + |Q||E|) since in the worst case each C(p) containsall states of the transducer. Thus, the overall complexity of the algorithm is

O(|Compute-Closure| + |Q|2 + |Q||E|(T! + T")), (30)

where O(|Compute-Closure|) denotes the total cost of the closure compu-tation. We are now examining several special cases of practical interest:

• T! is acyclic, that is T admits no !-cycle, a rather frequent case in practice.In that case, the single-source shortest distance algorithm can be used forany semiring S and has linear time complexity. The total complexity ofapplying the algorithm at each state is O(|Q|2 + |Q||E|(T! + T")) andmatches that of the second stage. Thus, the overall complexity of epsilon-removal is then

O(|Q|2 + |Q||E|(T! + T")). (31)

The algorithm can in fact be improved in that special case. The complexityof the computation of the all-pairs shortest distances can be substantiallyimproved if the states of T! are visited in reverse topological order and ifthe single-source shortest-distance algorithm is interleaved with the actualremoval of !s as follows: for each state p of T! visited in reverse topologicalorder,

– run a single-source shortest-distance algorithm with source p to com-pute the distance from p to each state q in T!;

Algorithms 21

6 Optimization algorithms

6.1 Epsilon-Removal

The use of various automata or transducer operations such as rational opera-tions generate !-transitions. These transitions cause some delay in the use ofthe resulting transducers since the search for an alphabet symbol to matchin composition or other similar operations requires reading some sequences of!s first. To make these weighted transducers more e!cient to use, it may bepreferable to remove all !-transitions, that is to create an equivalent weightedtransducer with no !-transition. This section describes a general !-removalalgorithm that precisely achieves this task.

Simply removing !-transitions from the input transducer clearly does notresult in an equivalent one. Instead, for a given state p, the non-!-transitionsof all states q reachable from p via !-transitions should be added to those ofp. In the weighted case, this does not result in an equivalent transducer sincethe weights of the !-transitions from p to q would be ignored. Thus, beforeadding an outgoing transition of state q to p, the weight it carries must bepre-!-multiplied by the sum of the weights of the !-paths from p to q. Toensure that this weight is an element of the semiring, we will assume that Sis complete.11

This leads to a two-step algorithm [40]. Given a transducer T , let T! denotethe transducer derived from T by keeping only !-transitions and let d![p, q] =""!P (p,!,q)w["] denote the distance from state p to state q in T!. Then, thefollowing are the two main steps of the algorithms:

• !-closure computation: at each state p, the weighted !-closure defined by

C(p) = {(q, w) : P (p, !, q) #= $, w = d![p, q]} (28)

is computed.• actual removal of !s: all !-transitions are removed and for each p and each

(q, w) % C(p), the transition set of p is augmented with the followingtransitions

{(p, a, b, d![p, q] ! w, r) : (q, a, b, w, r) % E, (a, b) #= (!, !)}. (29)

If there exists (q, w) % C(p) with q % F , then d![p, q] ! #(q) must be"-added to the final weight of p.

The following is the pseudocode of the algorithm which follows the main stepsjust discussed.

Theorem 4 ([40]). Let T be a weighted transducer over a complete semir-ing S. Assume that the closures C(p) can be computed for any state p of T .Then, the weighted transducer T " returned by the epsilon-removal algorithmjust described is equivalent to T .

11 The results presented also hold in the case of closed semirings.

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Algorithm complexity*

Time O(V2logV+VE)

Space O(VE) V=#states, E=#arcs. *Tropical semiring www.OpenFST.org

Page 53: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (II) 53

  -closure computation of state p:

22 Mehryar Mohri

Epsilon-Removal(T )

1 for each p ! Q do

2 Compute-Closure(C(p))3 E! " E! " {(p, a, b, w, q) ! E : (a, b) #= (!, !)}4 F ! " F5 "! " "6 for each p ! Q do

7 for each (q, w!) ! C[p] do

8 E![p] " E![p] $ {(p, a, b, w! % w, r) : (q, a, b, w, r) ! E!}9 if q ! F then

10 if p #! F then

11 F ! " F & {p}12 "![p] " 013 "![p] " "![p] ' (w! % "(q))14 return T ! = (#, $, Q, I, F !, E!, %, "!)

The proof is simple and is given in [40].The !-closures C(p) can be computed using an all-pair shortest-distance

algorithm over T! when the semiring S is complete, or by applying the single-shortest distance algorithm from each source p when the semiring is k-closed,as described in Section 2. The complexity of the second stage of the algorithm(lines 6-13) is in O(|Q|2 + |Q||E|) since in the worst case each C(p) containsall states of the transducer. Thus, the overall complexity of the algorithm is

O(|Compute-Closure| + |Q|2 + |Q||E|(T! + T")), (30)

where O(|Compute-Closure|) denotes the total cost of the closure compu-tation. We are now examining several special cases of practical interest:

• T! is acyclic, that is T admits no !-cycle, a rather frequent case in practice.In that case, the single-source shortest distance algorithm can be used forany semiring S and has linear time complexity. The total complexity ofapplying the algorithm at each state is O(|Q|2 + |Q||E|(T! + T")) andmatches that of the second stage. Thus, the overall complexity of epsilon-removal is then

O(|Q|2 + |Q||E|(T! + T")). (31)

The algorithm can in fact be improved in that special case. The complexityof the computation of the all-pairs shortest distances can be substantiallyimproved if the states of T! are visited in reverse topological order and ifthe single-source shortest-distance algorithm is interleaved with the actualremoval of !s as follows: for each state p of T! visited in reverse topologicalorder,

– run a single-source shortest-distance algorithm with source p to com-pute the distance from p to each state q in T!;

Algorithms 21

6 Optimization algorithms

6.1 Epsilon-Removal

The use of various automata or transducer operations such as rational opera-tions generate !-transitions. These transitions cause some delay in the use ofthe resulting transducers since the search for an alphabet symbol to matchin composition or other similar operations requires reading some sequences of!s first. To make these weighted transducers more e!cient to use, it may bepreferable to remove all !-transitions, that is to create an equivalent weightedtransducer with no !-transition. This section describes a general !-removalalgorithm that precisely achieves this task.

Simply removing !-transitions from the input transducer clearly does notresult in an equivalent one. Instead, for a given state p, the non-!-transitionsof all states q reachable from p via !-transitions should be added to those ofp. In the weighted case, this does not result in an equivalent transducer sincethe weights of the !-transitions from p to q would be ignored. Thus, beforeadding an outgoing transition of state q to p, the weight it carries must bepre-!-multiplied by the sum of the weights of the !-paths from p to q. Toensure that this weight is an element of the semiring, we will assume that Sis complete.11

This leads to a two-step algorithm [40]. Given a transducer T , let T! denotethe transducer derived from T by keeping only !-transitions and let d![p, q] =""!P (p,!,q)w["] denote the distance from state p to state q in T!. Then, thefollowing are the two main steps of the algorithms:

• !-closure computation: at each state p, the weighted !-closure defined by

C(p) = {(q, w) : P (p, !, q) #= $, w = d![p, q]} (28)

is computed.• actual removal of !s: all !-transitions are removed and for each p and each

(q, w) % C(p), the transition set of p is augmented with the followingtransitions

{(p, a, b, d![p, q] ! w, r) : (q, a, b, w, r) % E, (a, b) #= (!, !)}. (29)

If there exists (q, w) % C(p) with q % F , then d![p, q] ! #(q) must be"-added to the final weight of p.

The following is the pseudocode of the algorithm which follows the main stepsjust discussed.

Theorem 4 ([40]). Let T be a weighted transducer over a complete semir-ing S. Assume that the closures C(p) can be computed for any state p of T .Then, the weighted transducer T " returned by the epsilon-removal algorithmjust described is equivalent to T .

11 The results presented also hold in the case of closed semirings.

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Algorithm complexity*

Time O(V2logV+VE)

Space O(VE) V=#states, E=#arcs. *Tropical semiring www.OpenFST.org Initialize the new WFST with

all non-epsilon arcs and states

Compute the e-closure for each state, q

Compute new, non-epsilon transitions for each state p, from its closure, C[p]

Handle final states and weights

Epsilon-closure of p is the set of states reachable from

p via epsilon transitions

Page 54: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (II) 54

  -closure computation of state p:

22 Mehryar Mohri

Epsilon-Removal(T )

1 for each p ! Q do

2 Compute-Closure(C(p))3 E! " E! " {(p, a, b, w, q) ! E : (a, b) #= (!, !)}4 F ! " F5 "! " "6 for each p ! Q do

7 for each (q, w!) ! C[p] do

8 E![p] " E![p] $ {(p, a, b, w! % w, r) : (q, a, b, w, r) ! E!}9 if q ! F then

10 if p #! F then

11 F ! " F & {p}12 "![p] " 013 "![p] " "![p] ' (w! % "(q))14 return T ! = (#, $, Q, I, F !, E!, %, "!)

The proof is simple and is given in [40].The !-closures C(p) can be computed using an all-pair shortest-distance

algorithm over T! when the semiring S is complete, or by applying the single-shortest distance algorithm from each source p when the semiring is k-closed,as described in Section 2. The complexity of the second stage of the algorithm(lines 6-13) is in O(|Q|2 + |Q||E|) since in the worst case each C(p) containsall states of the transducer. Thus, the overall complexity of the algorithm is

O(|Compute-Closure| + |Q|2 + |Q||E|(T! + T")), (30)

where O(|Compute-Closure|) denotes the total cost of the closure compu-tation. We are now examining several special cases of practical interest:

• T! is acyclic, that is T admits no !-cycle, a rather frequent case in practice.In that case, the single-source shortest distance algorithm can be used forany semiring S and has linear time complexity. The total complexity ofapplying the algorithm at each state is O(|Q|2 + |Q||E|(T! + T")) andmatches that of the second stage. Thus, the overall complexity of epsilon-removal is then

O(|Q|2 + |Q||E|(T! + T")). (31)

The algorithm can in fact be improved in that special case. The complexityof the computation of the all-pairs shortest distances can be substantiallyimproved if the states of T! are visited in reverse topological order and ifthe single-source shortest-distance algorithm is interleaved with the actualremoval of !s as follows: for each state p of T! visited in reverse topologicalorder,

– run a single-source shortest-distance algorithm with source p to com-pute the distance from p to each state q in T!;

Algorithms 21

6 Optimization algorithms

6.1 Epsilon-Removal

The use of various automata or transducer operations such as rational opera-tions generate !-transitions. These transitions cause some delay in the use ofthe resulting transducers since the search for an alphabet symbol to matchin composition or other similar operations requires reading some sequences of!s first. To make these weighted transducers more e!cient to use, it may bepreferable to remove all !-transitions, that is to create an equivalent weightedtransducer with no !-transition. This section describes a general !-removalalgorithm that precisely achieves this task.

Simply removing !-transitions from the input transducer clearly does notresult in an equivalent one. Instead, for a given state p, the non-!-transitionsof all states q reachable from p via !-transitions should be added to those ofp. In the weighted case, this does not result in an equivalent transducer sincethe weights of the !-transitions from p to q would be ignored. Thus, beforeadding an outgoing transition of state q to p, the weight it carries must bepre-!-multiplied by the sum of the weights of the !-paths from p to q. Toensure that this weight is an element of the semiring, we will assume that Sis complete.11

This leads to a two-step algorithm [40]. Given a transducer T , let T! denotethe transducer derived from T by keeping only !-transitions and let d![p, q] =""!P (p,!,q)w["] denote the distance from state p to state q in T!. Then, thefollowing are the two main steps of the algorithms:

• !-closure computation: at each state p, the weighted !-closure defined by

C(p) = {(q, w) : P (p, !, q) #= $, w = d![p, q]} (28)

is computed.• actual removal of !s: all !-transitions are removed and for each p and each

(q, w) % C(p), the transition set of p is augmented with the followingtransitions

{(p, a, b, d![p, q] ! w, r) : (q, a, b, w, r) % E, (a, b) #= (!, !)}. (29)

If there exists (q, w) % C(p) with q % F , then d![p, q] ! #(q) must be"-added to the final weight of p.

The following is the pseudocode of the algorithm which follows the main stepsjust discussed.

Theorem 4 ([40]). Let T be a weighted transducer over a complete semir-ing S. Assume that the closures C(p) can be computed for any state p of T .Then, the weighted transducer T " returned by the epsilon-removal algorithmjust described is equivalent to T .

11 The results presented also hold in the case of closed semirings.

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Initialize the new WFST with all non-epsilon arcs and states

Compute the e-closure for each state, q

Compute new, non-epsilon transitions for each state p, from its closure, C[p]

Handle final states and weights

Epsilon-closure of p is the set of states reachable from

p via epsilon transitions

Disjoint union

A∪ B and A∩ B =∅

Algorithm complexity*

Time O(V2logV+VE)

Space O(VE) V=#states, E=#arcs. *Tropical semiring www.OpenFST.org

Page 55: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 55

A

Algorithm initialization

Page 56: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 56

A

Copy all states, and non-epsilon arcs from the

input transducer

Page 57: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 57

A

For each state in the original input transducer, compute the epsilon-closure – the states that are accessible via epsilon-arcs.

Page 58: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 58

A

Generate new, non-epsilon arcs based on the epsilon closure. Similarly, generate new final states where necessary.

Page 59: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 59

A

Page 60: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 60

A

Page 61: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 61

A

Page 62: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 62

A

Page 63: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 63

A

Page 64: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 64

States 2 and 3 are not co-accessible. These may be

deleted along with associated transitions.

Page 65: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (example) 65

Page 66: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Generic Epsilon-Removal (III) 66

  Epsilon removal may be employed to remove arbitrary symbols other than ,  Relabel the to an alternative, map other desired symbols to

epsilon, perform generic epsilon removal, and restore the original labels

  Choice of semiring affects the time complexity of the algorithm.  Tropical semiring is faster because it takes advantage of the

viterbi approximation for the shortest path computation  Other semirings exhibit exponential time complexity  Un-weighted epsilon-removal exhibits time complexity

O(V2+VE)

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Algorithms 17

! "#$# %&$! '($! )*$*! "

#$%&

!!'(

%$#

T1 T2

!

!!!"!!!"

"#$#

!!!"!!!"

%&$!#

!!!"!!!"

'($!#

!!!"!!!"

)*$*

!!!"!!!"

!

!!"!

"#$%

!!"!

&!#"$$$''''(

!!"!

)%$#

!!"!

T̃1 T̃2

!"! #"# #"$

$"# $"$

%"# %"$

&"%

'() !(*

+(!

,(!

+(!

,(!

!(*

!(*

)('

+(*

-.(./ -!!"!!/

-!!"!!/

-!!"!!/

-!#"!#/-!#"!#/

-!#"!#/ -!#"!#/

-.(./

-!#"!!$

!

"#"

!!"!#$

!#"!#

%

!!"!!

"#"

!#"!#

"#"

!!"!!

(a) (b)

Fig. 7. Redundant !-paths in composition. All transition and final weights are equalto 1. (a) A straightforward generalization of the !-free case would generate all thepaths from (1, 1) to (3, 2) when composing T1 and T2 and produce an incorrectresults in non-idempotent semirings. (b) Filter transducer F [46]. The shorthand xis used to represent an element of ".

In the worst case, all transitions of T1 leaving a state q1 match all thoseof T2 leaving state q!1, thus the space and time complexity of composition isquadratic: O(|T1||T2|). However, an important feature of composition is thatit admits a natural on-demand computation which can be used to constructonly the part of the composed transducer that is needed. Figures 6(a)-(c)illustrate the algorithm when applied to the transducers of Figures 6(a)-(b)defined over the probability semiring.

More care is needed when T1 admits output ! labels or T2 input ! labels.Indeed, as illustrated by Figure 7, a straightforward generalization of the !-free case would generate redundant !-paths and, in the case of non-idempotentsemirings, would lead to an incorrect result. The weight of the matching pathsof the original transducers would be counted p times, where p is the numberof redundant paths in the result of composition.

To cope with this problem, all but one !-path must be filtered out ofthe composite transducer. Figure 7 indicates in boldface one possible choice

Page 67: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Other algorithms 67

  Minimization   Given an input WFST, produces a

‘minimal’ version which is guaranteed to have the smallest possible number of states while preserving the input language and and weight/path properties of the original

  Projection   Turns a transducer into an acceptor

by projecting either just the input language or just the output language

  Weight-pushing   Pushes arc weights forward or

backward, accumulating and/or distributing them where appropriate according to the semiring

A

Project(A)

A Minimize(A)

A Push(A)

Page 68: Weighted Finite-State Transducersnovakj/wfst-algorithms.pdf(4) ¯0 is an annihilator for ⊗ : ∀a ∈ K,a⊗ ¯0= ¯0 ⊗ a = ¯0 1 3 A monoid is an algebraic structure that supports

Limitations   Weighted Finite-State Transducers provide a powerful and flexible

framework for performing many operations however they do have some limitations   Expressive power

WFSTs are equivalent in terms of representational power to regular languages thus they suffer from the same limitations. In particular there are certain grammars which cannot be represented by WFSTs, e.g., BNAN where N is not specified at compile time. Although this may be approximated for various values of N it cannot be explicitly represented by a WFST or a regular expression

  Computing resources Large static graphs such as those often employed for LVCSR tasks may require large amounts of memory, space, and time to construct, compute, and store.

68