speech recognition - nyu computer sciencemohri/asr07/lecture_2.pdf · 2007. 10. 29. · mehryar...

65
Speech Recognition Lecture 2: Weighted Finite-StateTransducer Software Library Mehryar Mohri Courant Institute of Mathematical Sciences [email protected] 1

Upload: others

Post on 27-Jan-2021

8 views

Category:

Documents


0 download

TRANSCRIPT

  • Speech RecognitionLecture 2: Weighted Finite-StateTransducer

    Software Library

    Mehryar MohriCourant Institute of Mathematical Sciences

    [email protected]

    1

    mailto:[email protected]:[email protected]

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Software Libraries

    2

    FSM Library: Finite-State Machine Library. Generalsoftware utilities for building, combining, optimizing, and searching weighted automata and transducers (MM, Pereira, and Riley, 2000).

    http://www.research.att.com/projects/mohri/fsm

    OpenFst Library: Open-source Finite-state transducer Library (Allauzen et al., 2007).

    http://www.openfst.org

    2

    http://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsm

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Software Libraries

    GRM Library: Grammar Library. General softwarecollection for constructing and modifying weighted automata and transducers representing grammars and statistical language models (Allauzen, MM, and Roark, 2005).

    http://www.research.att.com/projects/mohri/grm

    DCD Library: Decoder Library. General softwarecollection for speech recognition decoding and related functions (MM and Riley, 2003).

    http://www.research.att.com/~fsmtools/dcd

    3

    3

    http://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsm

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    FSM Library

    The FSM utilities construct, combine, minimize, and search weighted finite-states transducers.

    • User Program Level: Programs that read from and write to files or pipelines, fsm(1): fsmintersect in1.fsm in2.fsm >out.fsm

    • C(++) Library Level: Library archive of C(++) functions that implements the user program level, fsm(3): Fsm in1 = FSMLoad("in1.fsm"); Fsm in2 = FSMLoad("in2.fsm"); Fsm out = FSMIntersect(fsm1, fsm2); FSMDump("out.fsm", out);

    4

    4

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 5

    • Definition Level: Specification of labels, of costs,and of types of FSM representations.

    5

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weighted automata and transducers

    Rational operations

    Elementary unary operations

    Fundamental binary operations

    Optimization algorithms

    Search algorithms

    This Lecture

    6

    6

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    FSM File Types

    Textual format

    • automata/acceptor files,• transducer files,• symbols files.Binary format: compiled representation used by all FSM utilities.

    7

    7

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Compiling, Printing, and Drawing

    Compiling

    • fsmcompile -s tropical -iA.syms A.fsm• fsmcompile -s log -iA.syms -oA.syms -t T.fsmPrinting

    • fsmprint -iA.syms A.txt• fsmprint -iA.syms -oA.syms T.psDrawing

    • fsmdraw -iA.syms A.ps• fsmdraw -iA.syms -oA.syms T.ps

    8

    8

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weight Sets: Semirings

    A semiring is a ring that may lack negation.

    • sum: to compute the weight of a sequence (sum of the weights of the paths labeled with that sequence).

    • product: to compute the weight of a path (product of the weights of constituent transitions).

    (K,⊕,⊗, 0, 1)

    9

    9

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Semirings - Examples

    Semiring Set ⊕ ⊗ 0 1

    Boolean {0, 1} ∨ ∧ 0 1Probability R+ + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R ∪ {−∞,+∞} min + +∞ 0

    with ⊕log defined by: x ⊕log y = − log(e−x + e−y).

    10

    10

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Automata/Acceptors

    Graphical Representation ( )

    Acceptor file ( )

    Symbols file ( )

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    A.ps

    A.txt

    A.syms

    0 0 red .5

    0 1 green .3

    1 2 blue

    1 2 yellow .6

    2 .8

    red 1

    green 2

    blue 3

    yellow 4

    11

    11

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Transducers

    Graphical Representation ( )

    Transducer file ( )

    Symbols file ( )red 1

    green 2

    blue 3

    yellow 4

    T.ps

    0

    red:yellow/0.5

    1 green:blue/0.3

    2 /0.8blue:green/0

    yellow:red/0.6

    0 0 red yellow .5

    0 1 green blue .3

    1 2 blue green

    1 2 yellow red .6

    2 .8

    T.txt

    T.syms

    12

    12

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Paths - Definitions and Notation

    Path

    Sets of paths

    • : paths from to .• : paths in with input label x.• : paths in with output

    label y.

    P (R1, R2) R1 ⊆ Q R2 ⊆ QP (R1, x, R2) P (R1, R2)P (R1, x, y, R2) P (R1, x, R2)

    13

    π

    n[π]p[π]i[π]:o[π]

    input label output label

    previous stateor source state

    next state ordestination state

    13

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    General Definitions

    Alphabets: input , output .

    States: , initial states , final states .

    Transitions: .

    Weight functions:

    • initial: .• final: .

    Σ ∆

    Q

    E ⊆ Q × (Σ ∪ {!}) × (∆ ∪ {!}) × K × Q

    λ : I → K

    ρ : F → K

    14

    I F

    14

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Automata and Transducers - Definitions

    Automaton

    Transducer

    A = (Σ, Q, I, F, E, λ, ρ)

    T = (Σ,∆, Q, I, F, E, λ, ρ)

    [[T ]](x, y) =⊕

    π∈P (I,x,y,F )

    λ(p[π]) ⊗ w[π] ⊗ ρ(n[π])

    ∀x ∈ Σ∗, y ∈ ∆

    ∗,

    [[A]](x) =⊕

    π∈P (I,x,F )

    λ(p[π]) ⊗ w[π] ⊗ ρ(n[π])

    ∀x ∈ Σ∗,

    15

    15

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weighted Automata

    Sum of the weights of all successful paths labeled with x

    [[A]](x) =

    0

    1a/0.12

    a/0.5

    b/0.2

    a/0.43/0.1

    b/0.3

    b/0.6

    [[A]](abb) = .1 × .2 × .3 × .1 + .5 × .3 × .6 × .1

    16

    16

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weighted Transducers

    [[T ]](x, y) = Sum of the weights of all successful paths with input x and output y.

    0

    1a:b/0.12

    a:b/0.5

    b:a/0.2

    a:a/0.43/0.1

    b:a/0.3

    b:a/0.6

    [[T ]](abb, baa) = .1 × .2 × .3 × .1 + .5 × .3 × .6 × .1

    17

    17

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weighted automata and transducers

    Rational operations

    Elementary unary operations

    Fundamental binary operations

    Optimization algorithms

    Search algorithms

    This Lecture

    18

    18

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Rational Operations

    Sum

    Product

    Closure

    [[T1 ⊕ T2]](x, y) = [[T1]](x, y) ⊕ [[T2]](x, y)

    [[T ∗]](x, y) =∞⊕

    n=0

    [[T ]]n(x, y)

    19

    [[T1 ⊗ T2]](x, y) =⊕

    x=x1x2y=y1y2

    [[T1]](x1, y1) ⊗ [[T2]](x2, y2).

    19

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 20

    • Conditions (on the closure operation): condition on T: e.g., weight of ε-cycles = (regulated transducers), or semiring condition: e.g., as with the tropical semiring (more generally locally closed semirings).

    • Complexity and implementation:• linear-time complexity:

    • lazy implementation.

    1 ⊕ x = 1

    0

    O((|E1| + |Q1|) + (|E2| + |Q2|))

    O(|Q| + |E|)

    or

    20

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmunion A.fsm B.fsm >C.fsm

    Graphical representation:

    Sum - Illustration

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0

    1 /0green/0.4

    2 /0.3

    blue/1.2

    C.fsa

    0

    red/0.5

    1 green/0.3 2 /0.8blue/0

    yellow/0.6

    3 4 /0green/0.4

    5 /0.3

    blue/1.2

    6

    eps/0

    eps/0

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0

    1 /0green/0.4

    2 /0.3

    blue/1.2

    C.fsa

    0

    red/0.5

    1 green/0.3 2 /0.8blue/0

    yellow/0.6

    3 4 /0green/0.4

    5 /0.3

    blue/1.2

    6

    eps/0

    eps/0

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0

    1 /0green/0.4

    2 /0.3

    blue/1.2

    C.fsa

    0

    red/0.5

    1 green/0.3 2 /0.8blue/0

    yellow/0.6

    3 4 /0green/0.4

    5 /0.3

    blue/1.2

    6

    eps/0

    eps/0

    21

    21

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmconcat A.fsm B.fsm >C.fsm

    Graphical representation:

    Product - Illustration

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0

    1 /0green/0.4

    2 /0.3

    blue/1.2

    C.fsa

    0

    red/0.5

    1 green/0.3 2 /0.8blue/0

    yellow/0.6

    3 4 /0green/0.4

    5 /0.3

    blue/1.2

    6

    eps/0

    eps/0

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0

    1 /0green/0.4

    2 /0.3

    blue/1.2

    C.fsa

    0

    red/0.5

    1 green/0.3 2 /0.8blue/0

    yellow/0.6

    3 4 /0green/0.4

    5 /0.3

    blue/1.2

    6

    eps/0

    eps/0

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0

    1 /0green/0.4

    2 /0.3

    blue/1.2

    C.fsa

    0

    red/0.5

    1 green/0.3

    2 blue/0

    yellow/0.63

    eps/0.8

    4 /0green/0.4

    5 /0.3

    blue/1.2

    22

    22

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmclosure B.fsm >C.fsm

    Graphical representation:

    Closure - Illustration

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0

    1 /0green/0.4

    2 /0.3

    blue/1.2

    C.fsa

    0

    red/0.5

    1 green/0.3 2 /0.8blue/0

    yellow/0.6

    3 4 /0green/0.4

    5 /0.3

    blue/1.2

    6

    eps/0

    eps/0

    B.fsa

    0

    1 /0green/0.4

    2 /0.3

    blue/1.2

    C.fsa

    0

    1 /0

    green/0.4

    2 /0.3

    blue/1.2

    eps/0

    eps/0.3

    3 /0eps/0

    23

    23

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weighted automata and transducers

    Rational operations

    Elementary unary operations

    Fundamental binary operations

    Optimization algorithms

    Search algorithms

    This Lecture

    24

    24

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Elementary Unary Operations

    Reversal

    Inversion

    Projection

    Linear-time complexity, lazy implementation (not for reversal).

    [[T̃ ]](x, y) = [[T ]](x̃, ỹ)

    [[T−1]](x, y) = [[T ]](y, x)

    [[A]](x) =⊕

    y

    [[T ]](x, y)

    25

    25

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmreverse A.fsm >C.fsm

    Graphical representation:

    Reversal - Illustration

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 blue/0

    yellow/0.6

    3 /0green/1.2

    4 /0.3

    blue/2

    C.fsa

    0

    4 eps/0

    5

    eps/0.33

    green/1.2

    blue/21 /0

    red/0.5

    2 green/0.3blue/0

    yellow/0.6

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 blue/0

    yellow/0.6

    3 /0green/1.2

    4 /0.3

    blue/2

    C.fsa

    0

    4 eps/0

    5

    eps/0.33

    green/1.2

    blue/21 /0

    red/0.5

    2 green/0.3blue/0

    yellow/0.6

    26

    26

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsminvert A.fsm >C.fsm

    Graphical representation:

    Inversion - Illustration

    A.fst

    0

    red:bird/0.5

    1 green:pig/0.3

    2 /0.8blue:cat/0

    yellow:dog/0.6

    C.fst

    0

    bird:red/0.5

    1 pig:green/0.3

    2 /0.8cat:blue/0

    dog:yellow/0.6

    A.fst

    0

    red:bird/0.5

    1 green:pig/0.3

    2 /0.8blue:cat/0

    yellow:dog/0.6

    C.fst

    0

    bird:red/0.5

    1 pig:green/0.3

    2 /0.8cat:blue/0

    dog:yellow/0.6

    27

    27

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmproject -1 T.fsm >A.fsm

    Graphical representation:

    Projection - Illustration

    A.fst

    0

    red:bird/0.5

    1 green:pig/0.3

    2 /0.8blue:cat/0

    yellow:dog/0.6

    C.fst

    0

    bird:red/0.5

    1 pig:green/0.3

    2 /0.8cat:blue/0

    dog:yellow/0.6

    T.fst

    0

    red:bird/0.5

    1 green:pig/0.3

    2 /0.8blue:cat/0

    yellow:dog/0.6

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    28

    28

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weighted automata and transducers

    Rational operations

    Elementary unary operations

    Fundamental binary operations

    Optimization algorithms

    Search algorithms

    This Lecture

    29

    29

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Some Fundamental Binary Operations

    Composition ( commutative)

    Intersection ( commutative)

    Difference ( unweighted and deterministic)

    (K,⊕,⊗, 0, 1)

    [[T1 ◦ T2]](x, y) =⊕

    z

    [[T1]](x, z) ⊗ [[T2]](z, y)

    [[A1 ∩ A2]](x) = [[A1]](x) ⊗ [[A2]](x)

    (K,⊕,⊗, 0, 1)

    A2

    [[A1 − A2]](x) = [[A1 ∩ A2]](x)

    (Pereira and Riley, 1997; MM et al. 1996)

    30

    30

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 31

    • Complexity and implementation:• quadratic complexity:

    • path multiplicity in presence of ε-transitions: ε-filter;

    • lazy implementation.

    O((|E1| + |Q1|) (|E2| + |Q2|))

    31

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmcompose A.fsm B.fsm >C.fsm

    Graphical representation:

    Composition - Illustration

    0

    1a:b/0.1

    2

    b:a/0.2

    c:a/0.3

    3/0.6

    a:a/0.4

    b:b/0.50 1

    b:c/0.32/0.7

    a:b/0.4

    a:b/0.6

    (0, 0) (1, 1)a:c/0.4

    (1, 2)c:b/0.7

    (3, 2)/1.3a:b/0.8

    c:b/0.9

    a:b/1

    0

    1a:b/0.1

    2

    b:a/0.2

    c:a/0.3

    3/0.6

    a:a/0.4

    b:b/0.50 1

    b:c/0.32/0.7

    a:b/0.4

    a:b/0.6

    (0, 0) (1, 1)a:c/0.4

    (1, 2)c:b/0.7

    (3, 2)/1.3a:b/0.8

    c:b/0.9

    a:b/1

    0

    1a:b/0.1

    2

    b:a/0.2

    c:a/0.3

    3/0.6

    a:a/0.4

    b:b/0.50 1

    b:c/0.32/0.7

    a:b/0.4

    a:b/0.6

    (0, 0) (1, 1)a:c/0.4

    (1, 2)c:b/0.7

    (3, 2)/1.3a:b/0.8

    c:b/0.9

    a:b/1

    32

    32

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Multiplicity and ε-Transitions - Problem

    33

    Springer Handbook on Speech Processing and Speech Communication 14

    0,0 1,1 1,2

    2,1 2,2

    3,1 3,2

    4,3

    a:d !:e

    b:!

    c:!

    b:!

    c:!

    !:e

    !:e

    d:a

    b :e

    (x:x) (!1:!1)

    (!1:!1)

    (!1:!1)

    (!2:!2)(!2:!2)

    (!2:!2) (!2:!2)

    (x:x)

    (!2:!1)

    0 1a:a

    2b:!

    3c:!

    4d:d

    0 1a:d

    2!:e

    3d:a

    Figure 8: Redundant !-paths. A straightforward generalization of the !-free case could generate all the pathsfrom (1, 1) to (3, 2) when composing the two simple transducers on the right-hand side.

    λ[p[π]]⊗w[π]⊗x. Thus, x can be viewed as the resid-ual weight at state q. The algorithm takes as input aweighted automaton A = (A, Q, I, F, E, λ, ρ) and,when it terminates, yields an equivalent deterministicweighted automatonA′ = (A, Q′, I ′, F ′, E′, λ′, ρ′).

    The algorithm uses a queue S containing the setof states of the resulting automaton A′, yet to be ex-amined. The sets Q′, I ′, F ′, and E′ are initiallyempty. The queue discipline for S can be arbitrar-ily chosen and does not affect the termination of the

    algorithm. The initial state set of A′ is I ′ = {i′}where i′ is the weighted set of the initial states of Awith the respective initial weights. Its initial weight

    is 1 (lines 1-2). S originally contains only the subsetI ′ (line 3). Each time through the loop in lines 4-16,a new weighted subset p′ is dequeued from S (lines5-6). For each x labeling at least one of the transi-tions leaving a state p in the weighted subset p′, anew transition with input label x is constructed. Theweight w′ associated to that transition is the sum ofthe weights of all transitions in E[Q[p′]] labeled withx pre-⊗-multiplied by the residual weight v at eachstate p (line 8). The destination state of the transi-tion is the subset containing all the states q reachedby transitions in E[Q[p′]] labeled with x. The weightof each state q of the subset is obtained by taking the⊕-sum of the residual weights of the states p ⊗-timesthe weight of the transition from p leading to q andby dividing that by w′. The new subset q′ is insertedin the queue S when it is a new state (line 16). If anyof the states in the subset q′ is final, q′ is made a fi-

    nal state and its final weight is obtained by summingthe final weights of all the final states in q′, pre-⊗-multiplied by their residual weight v (line 14-15).

    The worst case complexity of determinization isexponential even in the unweighted case. However,

    in many practical cases such as for weighted au-tomata used in large-vocabulary speech recognition,

    this blow-up does not occur. It is also important to

    notice that just like composition, determinization hasa natural lazy implementation in which only the tran-

    sitions required by an application are expanded in the

    result automaton.

    Unlike in the unweighted case, determinization

    does not halt on all input weighted automata. We

    say that a weighted automaton A is determinizableif the determinization algorithm halts for the inputA.With a determinizable input, the algorithm outputs anequivalent deterministic weighted automaton.

    The twins property for weighted automata char-

    acterizes determinizable weighted automata undersome general conditions [Mohri, 1997]. Let A bea weighted automaton over a weakly left-divisible

    semiring K. Two states q and q′ of A are said to besiblings if there are strings x and y in A∗ such thatboth q and q′ can be reached from I by paths labeledwith x and there is a cycle at q and a cycle at q′ bothlabeled with y. When K is a commutative and can-cellative semiring, two sibling states are said to betwins when for every string y:

    w[P (q, y, q)] = w[P (q′, y, q′)]. (14)

    Springer Handbook on Speech Processing and Speech Communication 14

    0,0 1,1 1,2

    2,1 2,2

    3,1 3,2

    4,3

    a:d !:e

    b:!

    c:!

    b:!

    c:!

    !:e

    !:e

    d:a

    b :e

    (x:x) (!1:!1)

    (!1:!1)

    (!1:!1)

    (!2:!2)(!2:!2)

    (!2:!2) (!2:!2)

    (x:x)

    (!2:!1)

    0 1a:a

    2b:!

    3c:!

    4d:d

    0 1a:d

    2!:e

    3d:a

    Figure 8: Redundant !-paths. A straightforward generalization of the !-free case could generate all the pathsfrom (1, 1) to (3, 2) when composing the two simple transducers on the right-hand side.

    λ[p[π]]⊗w[π]⊗x. Thus, x can be viewed as the resid-ual weight at state q. The algorithm takes as input aweighted automaton A = (A, Q, I, F, E, λ, ρ) and,when it terminates, yields an equivalent deterministicweighted automatonA′ = (A, Q′, I ′, F ′, E′, λ′, ρ′).

    The algorithm uses a queue S containing the setof states of the resulting automaton A′, yet to be ex-amined. The sets Q′, I ′, F ′, and E′ are initiallyempty. The queue discipline for S can be arbitrar-ily chosen and does not affect the termination of the

    algorithm. The initial state set of A′ is I ′ = {i′}where i′ is the weighted set of the initial states of Awith the respective initial weights. Its initial weight

    is 1 (lines 1-2). S originally contains only the subsetI ′ (line 3). Each time through the loop in lines 4-16,a new weighted subset p′ is dequeued from S (lines5-6). For each x labeling at least one of the transi-tions leaving a state p in the weighted subset p′, anew transition with input label x is constructed. Theweight w′ associated to that transition is the sum ofthe weights of all transitions in E[Q[p′]] labeled withx pre-⊗-multiplied by the residual weight v at eachstate p (line 8). The destination state of the transi-tion is the subset containing all the states q reachedby transitions in E[Q[p′]] labeled with x. The weightof each state q of the subset is obtained by taking the⊕-sum of the residual weights of the states p ⊗-timesthe weight of the transition from p leading to q andby dividing that by w′. The new subset q′ is insertedin the queue S when it is a new state (line 16). If anyof the states in the subset q′ is final, q′ is made a fi-

    nal state and its final weight is obtained by summingthe final weights of all the final states in q′, pre-⊗-multiplied by their residual weight v (line 14-15).

    The worst case complexity of determinization isexponential even in the unweighted case. However,

    in many practical cases such as for weighted au-tomata used in large-vocabulary speech recognition,

    this blow-up does not occur. It is also important to

    notice that just like composition, determinization hasa natural lazy implementation in which only the tran-

    sitions required by an application are expanded in the

    result automaton.

    Unlike in the unweighted case, determinization

    does not halt on all input weighted automata. We

    say that a weighted automaton A is determinizableif the determinization algorithm halts for the inputA.With a determinizable input, the algorithm outputs anequivalent deterministic weighted automaton.

    The twins property for weighted automata char-

    acterizes determinizable weighted automata undersome general conditions [Mohri, 1997]. Let A bea weighted automaton over a weakly left-divisible

    semiring K. Two states q and q′ of A are said to besiblings if there are strings x and y in A∗ such thatboth q and q′ can be reached from I by paths labeledwith x and there is a cycle at q and a cycle at q′ bothlabeled with y. When K is a commutative and can-cellative semiring, two sibling states are said to betwins when for every string y:

    w[P (q, y, q)] = w[P (q′, y, q′)]. (14)

    33

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Solution - Filter F for Composition

    0

    x:x

    !2:!1

    1 !1:!1

    2

    !2:!2

    x:x

    !1:!1

    x:x

    !2:!2

    Replace T1 ◦ T2 with T̃1 ◦ F ◦ T̃2.

    34

    34

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmintersect A.fsm B.fsm >C.fsm

    Graphical representation:

    Intersection - Illustration

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0 /0 1 red/0.2

    blue/0.6

    green/0.4

    2 /0.5yellow/1.3

    C.fsa

    0 1 red/0.7

    2 green/0.7

    3 /0.8blue/0.6

    4 /1.3

    yellow/1.9

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0 /0 1 red/0.2

    blue/0.6

    green/0.4

    2 /0.5yellow/1.3

    C.fsa

    0 1 red/0.7

    2 green/0.7

    3 /0.8blue/0.6

    4 /1.3

    yellow/1.9

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0 /0 1 red/0.2

    blue/0.6

    green/0.4

    2 /0.5yellow/1.3

    C.fsa

    0 1 red/0.7

    2 green/0.7

    3 /0.8blue/0.6

    4 /1.3

    yellow/1.9

    35

    35

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmdifference A.fsm B.fsm >C.fsm

    Graphical representation:

    Difference - Illustration

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0 /0 1 red/0.2

    blue/0.6

    green/0.4

    2 /0.5yellow/1.3

    C.fsa

    0 1 red/0.7

    2 green/0.7

    3 /0.8blue/0.6

    4 /1.3

    yellow/1.9

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0 1 red

    blue

    green

    2 yellow

    C.fsa

    0

    1 red/0.5

    3 green/0.3

    2 red/0.5

    4 /0.8blue/0

    yellow/0.6

    green/0.3

    red/0.5

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    B.fsa

    0 1 red

    blue

    green

    2 yellow

    C.fsa

    0

    1 red/0.5

    3 green/0.3

    2 red/0.5

    4 /0.8blue/0

    yellow/0.6

    green/0.3

    red/0.5

    36

    36

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weighted automata and transducers

    Rational operations

    Elementary unary operations

    Fundamental binary operations

    Optimization algorithms

    Search algorithms

    This Lecture

    37

    37

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Optimization Algorithms

    Connection: removes non-accessible/non-coaccessible states.

    ε-Removal: removes ε-transitions.

    Determinization: creates equivalent deterministic machine.

    Pushing: creates equivalent pushed/stochastic machine.

    Minimization: creates equivalent minimal deterministic machine.

    38

    38

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 39

    • Conditions: there are specific semiring conditionsfor the use of these algorithms, e.g., not all weighted automata or transducers can be determinized using the determinization algorithm.

    39

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmconnect A.fsm >C.fsm

    Graphical representation:

    Connection - Illustration

    Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology ISMB 200534page

    • Program: fsmconnect A.fsm >C.fsm• Graphical representation:

    Connection - Illustration

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    5

    red/0

    3 4 /0.2green/0.2

    A.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    5

    red/0

    3 4 /0.2green/0.2

    C.fsa

    0

    red/0.5

    1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    40

    40

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Definition:

    • Input: weighted transducer .• Output: equivalent weighted transducer with

    all states connected.

    Description:3. Depth-first search of from .

    4. Mark accessible and coaccessible states.

    5. Keep marked states and corresponding transitions.

    Complexity: linear

    Connection - Algorithm

    T1

    T2

    T1 I1

    O(|Q1| + |E1|).41

    41

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmrmepsilon T.fsm >TP.fsm

    Graphical representation:

    ε-Removal - Illustration

    0 1a:b/0.1

    !:!/0.2

    2

    !:!/0.3

    3b:a/0.4

    b:b/0.5

    !:!/0.64/0

    b:a/0.7

    b:a/0.9

    !:!/1

    a:a/0.8 0

    10.2

    30.8 2

    0.3

    0.6

    1.6

    1

    4/0

    0

    4/0

    a:a/1.6

    b:a/1

    1

    a:b/0.1

    b:a/1.7

    3

    b:a/0.4

    2

    b:b/0.7

    a:a/1.4

    b:a/2.3

    b:a/1.5

    b:b/0.5

    a:a/0.8

    b:a/1.7

    b:a/0.9

    b:a/0.7

    0 1a:b/0.1

    !:!/0.2

    2

    !:!/0.3

    3b:a/0.4

    b:b/0.5

    !:!/0.64/0

    b:a/0.7

    b:a/0.9

    !:!/1

    a:a/0.8 0

    10.2

    30.8 2

    0.3

    0.6

    1.6

    1

    4/0

    0

    4/0

    a:a/1.6

    b:a/1

    1

    a:b/0.1

    b:a/1.7

    3

    b:a/0.4

    2

    b:b/0.7

    a:a/1.4

    b:a/2.3

    b:a/1.5

    b:b/0.5

    a:a/0.8

    b:a/1.7

    b:a/0.9

    b:a/0.7

    0 1a:b/0.1

    !:!/0.2

    2

    !:!/0.3

    3b:a/0.4

    b:b/0.5

    !:!/0.64/0

    b:a/0.7

    b:a/0.9

    !:!/1

    a:a/0.8 0

    10.2

    30.8 2

    0.3

    0.6

    1.6

    1

    4/0

    0

    4/0

    a:a/1.6

    b:a/1

    1

    a:b/0.1

    b:a/1.7

    3

    b:a/0.4

    2

    b:b/0.7

    a:a/1.4

    b:a/2.3

    b:a/1.5

    b:b/0.5

    a:a/0.8

    b:a/1.7

    b:a/0.9

    b:a/0.7

    42

    42

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Definition:

    • Input: weighted transducer .• Output: equivalent WFST with no ε-transition.Description:• Computation of ε-closures.• Removal of εs.

    • Complexity:• Acyclic • General case (tropical semiring):

    ε-Removal - Algorithm

    T1

    T2

    T! : O(|Q|2 + |Q||E|(T⊕ + T⊗)).

    O(|Q||E| + |Q|2 log |Q|)

    (MM, 2001)

    43

    43

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Computation of ε-closures

    Definition: for p in Q,

    Problem formulation: all-pairs shortest-distance problem in (T reduced to its ε-transitions).

    • closed semirings: generalization of Floyd-Warshall algorithm.

    • k-closed semirings: generic sparse shortest-distance algorithm.

    C[p] ={

    (q, w) : q ∈ ![p], d[p, q] = w "= 0}

    ,

    where d[p, q] =⊕

    π∈P (p,",q)

    w[π].

    T!

    44

    44

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Definition:

    • Input: weighted automaton or transducer• Output: equivalent subsequential or deterministic

    machine : has a unique initial state and no two transitions leaving the same state share the same input label.

    Description:3. Generalization of subset construction: weighted

    subsets , where s are remainder weights.

    4. Computation of the weight of resulting transitions.

    Determinization - Algorithm

    T1

    T2

    {(q1, w1), . . . , (qn, wn)} wi

    (MM,1997)

    45

    45

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Semiring: weakly left divisible semirings.

    Definition: T is determinizable when the determinization algorithm applies to T.

    • All unweighted automata are determinizable.• All acyclic machines are determinizable.• Not all weighted automata or transducers are

    determinizable.

    • Characterization based on the twins property.Complexity: exponential, but lazy implementation.

    Determinization - Conditions

    46

    46

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmdeterminize A.fsm >D.fsm

    Graphical representation:

    Determinization of Weighted Automata - Illustration

    0

    2a/1

    b/4

    1/0

    a/3

    b/1

    3/0

    b/1

    b/3

    b/3

    b/5

    {(0,0)}

    {(1,2),(2,0)}/2a/1

    {(1,0),(2,3)}/0

    b/1

    {(3,0)}/0

    b/1

    b/3

    0

    2a/1

    b/4

    1/0

    a/3

    b/1

    3/0

    b/1

    b/3

    b/3

    b/5

    {(0,0)}

    {(1,2),(2,0)}/2a/1

    {(1,0),(2,3)}/0

    b/1

    {(3,0)}/0

    b/1

    b/3

    47

    47

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmdeterminize T.fsm >D.fsm

    Graphical representation:

    Determinization of Weighted Transducers - Illustration

    0

    2a:a/0.1

    1

    a:c/0.5

    a:b/0.2 4/0.7

    b:eps/0.4

    3

    b:b/0.3

    c:eps/0.7

    a:b/0.6

    a:eps/0.5

    Determinization of Weighted Transducers – Illustration

    • Program: fsmdeterminize T.fsm >D.fsm

    • Graphical Representation:

    {(0, eps, 0)}{(1, c, 0.4),

    (2, a, 0)}

    a:eps/0.1

    {(0, b, 0)}a:a/0.2

    {(3, b, 0),

    (4, eps, .1)}

    /(eps, 0.8)

    b:a/0.3

    {(4, eps, 0)}

    /(eps, 0.7)c:c/1.1

    a:b/0.1

    a:b/0.5

    a:b/0.6

    48

    48

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Definition:

    • Input: weighted automaton or transducer• Output: equivalent automaton or transducer

    such that the longest common prefix of all outgoing paths be ε or such that the sum of the weights of all outgoing transitions be modulo the string or weight at the initial state.

    Pushing - Algorithm

    T1

    T2

    1

    (MM,1997; 2004)

    49

    49

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 50

    • Description:1. Single-source shortest distance computation:

    for each state q,

    2. Reweighting: for each transition e such that

    d[q] =⊕

    π∈P (q,F )

    w[π].

    d[p[e]] != 0,

    w[e] ← (d[p[e]])−1(w[e] ⊗ d[n[e]])

    50

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 51

    • Conditions (automata case): weakly divisible semiring, zero-sum free semiring or automaton.

    • Complexity:• automata case• acyclic case: linear• general case (tropical semiring):

    • transducer case:

    O(|Q| + |E|(T⊕ + T⊗)).

    O(|Q| log |Q| + |E|).

    O((|Pmax| + 1) |E|).

    51

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmpush -ic A.fsm >P.fsm

    Graphical representation:

    • Tropical semiring:

    Weight Pushing - Illustration

    0

    1

    a/0

    b/1

    c/4

    2

    d/0

    e/1

    3/0

    e/0

    f/1

    e/10

    f/11

    0

    1

    a/0

    b/1

    c/4

    2

    d/10

    e/11

    3/0

    e/0

    f/1

    e/0

    f/1

    0

    1

    a/0

    b/1

    c/4

    2

    d/0

    e/1

    3/0

    e/0

    f/1

    e/10

    f/11

    0

    1

    a/0

    b/1

    c/4

    2

    d/10

    e/11

    3/0

    e/0

    f/1

    e/0

    f/1

    52

    52

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 53

    • Log semiring:

    0

    1

    a/0

    b/1

    c/4

    2

    d/0

    e/1

    3/0

    e/0

    f/1

    e/10

    f/11

    0

    1

    a/0.3266

    b/1.3266

    c/4.3266

    2

    d/10.326

    e/11.326

    3/-0.639

    e/0.3132

    f/1.3132

    e/0.3132

    f/1.3132

    0

    1

    a/0

    b/1

    c/4

    2

    d/0

    e/1

    3/0

    e/0

    f/1

    e/10

    f/11

    0

    1

    a/0.3266

    b/1.3266

    c/4.3266

    2

    d/10.326

    e/11.326

    3/-0.639

    e/0.3132

    f/1.3132

    e/0.3132

    f/1.3132

    53

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmpush -il T.fsm >P.fsm

    Graphical representation:

    Label Pushing - Illustration

    0

    a:a

    1c: !

    b:d

    2a: !

    3b:!

    4

    c:d

    d:!

    a:d

    5a: !

    6f:d

    0

    a:a

    1c:d

    b:!2

    a:d

    3b:!

    4

    c: !

    d:d

    a:d

    5a: !

    6f: !

    0

    a:a

    1c: !

    b:d

    2a: !

    3b:!

    4

    c:d

    d:!

    a:d

    5a: !

    6f:d

    0

    a:a

    1c:d

    b:!2

    a:d

    3b:!

    4

    c: !

    d:d

    a:d

    5a: !

    6f: !

    54

    54

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Definition:

    • Input: deterministic weighted automaton or transducer .

    • Output: equivalent deterministic automaton or transducer with the minimal number of states and transitions.

    Description:

    • Canonical representation: use pushing or other algorithm to standardize input automata.

    • Automata minimization: encode pairs (label, weight) as labels and use classical unweighted minimization algorithm.

    Minimization - Algorithm

    T1

    T2

    (MM,1997)

    55

    55

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 56

    • Complexity:

    • Automata case

    • acyclic case: linear,

    • general case (tropical semiring):

    • Transducer case

    • acyclic case:

    • general case (tropical semiring):

    O(|Q| + |E|(T⊕ + T⊗)).

    O(|E| log |Q|).

    O(S + |Q| + |E| (|Pmax| + 1)).

    O(S + |Q| + |E| (log |Q| + |Pmax|)).

    56

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmminimize D.fsm >M.fsm

    Graphical representation:

    Minimization - Illustration

    0 1a/0

    b/1

    d/0

    2b/1

    4

    a/2

    3

    c/2

    5

    c/3

    d/3

    6

    e/2

    c/1

    7/0e/1

    d/4 e/3

    0 1a/0

    b/1

    d/0

    2b/0

    4

    a/3

    3

    c/0

    5

    c/0

    d/6

    6

    e/0

    c/1

    7/6e/0

    d/6 e/0

    0 1a/0

    b/1

    d/0

    2

    b/0

    a/3

    3

    c/0

    d/6 6e/0

    c/1

    7/6e/0

    0 1a/0

    b/1

    d/0

    2b/1

    4

    a/2

    3

    c/2

    5

    c/3

    d/3

    6

    e/2

    c/1

    7/0e/1

    d/4 e/3

    0 1a/0

    b/1

    d/0

    2b/0

    4

    a/3

    3

    c/0

    5

    c/0

    d/6

    6

    e/0

    c/1

    7/6e/0

    d/6 e/0

    0 1a/0

    b/1

    d/0

    2

    b/0

    a/3

    3

    c/0

    d/6 6e/0

    c/1

    7/6e/0

    57

    57

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Definition:

    • Input: deterministic weighted automata A and B.• Output: TRUE iff A and B equivalent.Description

    • Canonical representation: use pushing or other algorithm to standardize input automata.

    • Automata minimization: encode pairs (label, weight) as labels and use classical algorithm for testing the equivalence of unweighted automata.

    Complexity: (second stage is quasi-linear)

    Equivalence - Algorithm

    O(|E1| + |E2| + |Q1| log |Q1| + |Q2| log |Q2|).

    (MM,1997):

    58

    58

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmequiv [-v] D.fsm M.fsm

    Graphical representation:

    Equivalence - Illustration

    D.fsa

    0 1 red/0.3

    2 /0.3blue/0.7

    3 /0.4

    yellow/0.9

    M.fsa

    0 1 red/0

    2 /1.3blue/0

    yellow/0.3

    D.fsa

    0 1 red/0.3

    2 /0.3blue/0.7

    3 /0.4

    yellow/0.9

    M.fsa

    0 1 red/0

    2 /1.3blue/0

    yellow/0.3

    59

    59

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Weighted automata and transducers

    Rational operations

    Elementary unary operations

    Fundamental binary operations

    Optimization algorithms

    Search algorithms

    This Lecture

    60

    60

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmbestpath [-n N] A.fsm >C.fsm

    Graphical representation:

    Single-Source Shortest-Distance Algorithms - Illustration

    A.fsa

    0

    1 red/0.5

    3 green/0.3

    2 red/0.5

    4 /0.8blue/0

    yellow/0.6

    green/0.3

    red/0.5

    C.fsa

    0 1 green/0.3

    2 /0.8blue/0

    A.fsa

    0

    1 red/0.5

    3 green/0.3

    2 red/0.5

    4 /0.8blue/0

    yellow/0.6

    green/0.3

    red/0.5

    C.fsa

    0 1 green/0.3

    2 /0.8blue/0

    61

    61

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Program: fsmprune -c1.0 A.fsm >C.fsm

    Graphical representation:

    Pruning - Illustration

    A.fsa

    0

    1 red/0.5

    3 green/0.3

    2 red/0.5

    4 /0.8blue/0

    yellow/0.6

    green/0.3

    red/0.5

    C.fsa

    0 1 green/0.3

    2 /0.8blue/0

    A.fsa

    0

    1 red/0.5

    3 green/0.3

    2 red/0.5

    4 /0.8blue/0

    yellow/0.6

    green/0.3

    C.fsa

    0 1 green/0.3

    2 /0.8blue/0

    yellow/0.6

    62

    62

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    Summary

    FSM Library:

    • weighted finite-state transducers (semirings);• elementary unary operations (e.g., reversal);• rational operations (sum, product, closure);• fundamental binary operations (e.g., composition);• optimization algorithms (e.g., ε-removal,

    determinization, minimization);

    • search algorithms (e.g., shortest-distance algorithms, n-best paths algorithms, pruning).

    63

    63

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    References

    • Cyril Allauzen and Mehryar Mohri. Efficient Algorithms for Testing the Twins Property. Journal of Automata, Languages and Combinatorics, 8(2):117-144, 2003.

    • Cyril Allauzen, Mehryar Mohri, and Brian Roark. The Design Principles and Algorithms of a Weighted Grammar Library. International Journal of Foundations of Computer Science, 16(3):403-421, 2005.

    • Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. OpenFst: a general and efficient weighted finite-state transducer library. In Proceedings of the Ninth International Conference on Implementation and Application of Automata, (CIAA 2007), volume 4783 of Lecture Notes in Computer Science, pages 11-23. Springer, Berlin, 2007.

    • Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23:2, 1997.

    • Mehryar Mohri. Weighted Grammar Tools: the GRM Library. In Robustness in Language and Speech Technology. pages 165-186. Kluwer Academic Publishers, The Netherlands, 2001.

    • Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press, 2005.

    64

    64

  • Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

    References

    • Mehryar Mohri. Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for Weighted Transducers. International Journal of Foundations of Computer Science, 13(1):129-143, 2002.

    • Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. The Design Principles of a Weighted Finite-State Transducer Library. Theoretical Computer Science, 231:17-32, January 2000.

    • Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and Speech Processing. In Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language. Budapest, Hungary, 1996. John Wiley and Sons, Chichester.

    • Mehryar Mohri and Michael Riley. A Weight Pushing Algorithm for Large Vocabulary Speech Recognition. In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech'01). Aalborg, Denmark, September 2001.

    • Fernando Pereira and Michael Riley. Finite State Language Processing, chapter Speech Recognition by Composition of Weighted Finite Automata. The MIT Press, 1997.

    65

    65