speech recognition - nyu computer sciencemohri/asr07/lecture_2.pdf · 2007. 10. 29. · mehryar...

Speech RecognitionLecture 2: Weighted Finite-StateTransducer

Software Library

Mehryar MohriCourant Institute of Mathematical Sciences

[email protected]

1

mailto:[email protected]:[email protected]

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage

Software Libraries

2

FSM Library: Finite-State Machine Library. Generalsoftware utilities for building, combining, optimizing, and searching weighted automata and transducers (MM, Pereira, and Riley, 2000).

http://www.research.att.com/projects/mohri/fsm

OpenFst Library: Open-source Finite-state transducer Library (Allauzen et al., 2007).

http://www.openfst.org

2

http://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsm


Software Libraries

GRM Library: Grammar Library. General softwarecollection for constructing and modifying weighted automata and transducers representing grammars and statistical language models (Allauzen, MM, and Roark, 2005).

http://www.research.att.com/projects/mohri/grm

DCD Library: Decoder Library. General softwarecollection for speech recognition decoding and related functions (MM and Riley, 2003).

http://www.research.att.com/~fsmtools/dcd

3

3

http://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsmhttp://www.research.att.com/projects/mohri/fsm


FSM Library

The FSM utilities construct, combine, minimize, and search weighted finite-states transducers.

• User Program Level: Programs that read from and write to files or pipelines, fsm(1): fsmintersect in1.fsm in2.fsm >out.fsm

• C(++) Library Level: Library archive of C(++) functions that implements the user program level, fsm(3): Fsm in1 = FSMLoad("in1.fsm"); Fsm in2 = FSMLoad("in2.fsm"); Fsm out = FSMIntersect(fsm1, fsm2); FSMDump("out.fsm", out);

4

4

Mehryar Mohri - Speech Recognition Courant Institute, NYUpage 5

• Definition Level: Specification of labels, of costs,and of types of FSM representations.

5


Weighted automata and transducers

Rational operations

Elementary unary operations

Fundamental binary operations

Optimization algorithms

Search algorithms

This Lecture

6

6


FSM File Types

Textual format

• automata/acceptor files,• transducer files,• symbols files.Binary format: compiled representation used by all FSM utilities.

7

7


Compiling, Printing, and Drawing

Compiling

• fsmcompile -s tropical -iA.syms A.fsm• fsmcompile -s log -iA.syms -oA.syms -t T.fsmPrinting

• fsmprint -iA.syms A.txt• fsmprint -iA.syms -oA.syms T.psDrawing

• fsmdraw -iA.syms A.ps• fsmdraw -iA.syms -oA.syms T.ps

8

8


Weight Sets: Semirings

A semiring is a ring that may lack negation.

• sum: to compute the weight of a sequence (sum of the weights of the paths labeled with that sequence).

• product: to compute the weight of a path (product of the weights of constituent transitions).

(K,⊕,⊗, 0, 1)

9

9


Semirings - Examples

Semiring Set ⊕ ⊗ 0 1

Boolean {0, 1} ∨ ∧ 0 1Probability R+ + × 0 1Log R ∪ {−∞,+∞} ⊕log + +∞ 0Tropical R ∪ {−∞,+∞} min + +∞ 0

with ⊕log defined by: x ⊕log y = − log(e−x + e−y).

10

10


Automata/Acceptors

Graphical Representation ( )

Acceptor file ( )

Symbols file ( )

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

A.ps

A.txt

A.syms

0 0 red .5

0 1 green .3

1 2 blue

1 2 yellow .6

2 .8

red 1

green 2

blue 3

yellow 4

11

11


Transducers

Graphical Representation ( )

Transducer file ( )

Symbols file ( )red 1

green 2

blue 3

yellow 4

T.ps

0

red:yellow/0.5

1 green:blue/0.3

2 /0.8blue:green/0

yellow:red/0.6

0 0 red yellow .5

0 1 green blue .3

1 2 blue green

1 2 yellow red .6

2 .8

T.txt

T.syms

12

12


Paths - Definitions and Notation

Path

Sets of paths

• : paths from to .• : paths in with input label x.• : paths in with output

label y.

P (R1, R2) R1 ⊆ Q R2 ⊆ QP (R1, x, R2) P (R1, R2)P (R1, x, y, R2) P (R1, x, R2)

13

π

n[π]p[π]i[π]:o[π]

input label output label

previous stateor source state

next state ordestination state

13


General Definitions

Alphabets: input , output .

States: , initial states , final states .

Transitions: .

Weight functions:

• initial: .• final: .

Σ ∆

Q

E ⊆ Q × (Σ ∪ {!}) × (∆ ∪ {!}) × K × Q

λ : I → K

ρ : F → K

14

I F

14


Automata and Transducers - Definitions

Automaton

Transducer

A = (Σ, Q, I, F, E, λ, ρ)

T = (Σ,∆, Q, I, F, E, λ, ρ)

[[T ]](x, y) =⊕

π∈P (I,x,y,F )

λ(p[π]) ⊗ w[π] ⊗ ρ(n[π])

∀x ∈ Σ∗, y ∈ ∆

∗,

[[A]](x) =⊕

π∈P (I,x,F )

λ(p[π]) ⊗ w[π] ⊗ ρ(n[π])

∀x ∈ Σ∗,

15

15


Weighted Automata

Sum of the weights of all successful paths labeled with x

[[A]](x) =

0

1a/0.12

a/0.5

b/0.2

a/0.43/0.1

b/0.3

b/0.6

[[A]](abb) = .1 × .2 × .3 × .1 + .5 × .3 × .6 × .1

16

16


Weighted Transducers

[[T ]](x, y) = Sum of the weights of all successful paths with input x and output y.

0

1a:b/0.12

a:b/0.5

b:a/0.2

a:a/0.43/0.1

b:a/0.3

b:a/0.6

[[T ]](abb, baa) = .1 × .2 × .3 × .1 + .5 × .3 × .6 × .1

17

17



Rational operations




Search algorithms

This Lecture

18

18


Rational Operations

Sum

Product

Closure

[[T1 ⊕ T2]](x, y) = [[T1]](x, y) ⊕ [[T2]](x, y)

[[T ∗]](x, y) =∞⊕

n=0

[[T ]]n(x, y)

19

[[T1 ⊗ T2]](x, y) =⊕

x=x1x2y=y1y2

[[T1]](x1, y1) ⊗ [[T2]](x2, y2).

19


• Conditions (on the closure operation): condition on T: e.g., weight of ε-cycles = (regulated transducers), or semiring condition: e.g., as with the tropical semiring (more generally locally closed semirings).

• Complexity and implementation:• linear-time complexity:

• lazy implementation.

1 ⊕ x = 1

0

O((|E1| + |Q1|) + (|E2| + |Q2|))

O(|Q| + |E|)

or

20


Program: fsmunion A.fsm B.fsm >C.fsm

Graphical representation:

Sum - Illustration

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0

1 /0green/0.4

2 /0.3

blue/1.2

C.fsa

0

red/0.5

1 green/0.3 2 /0.8blue/0

yellow/0.6

3 4 /0green/0.4

5 /0.3

blue/1.2

6

eps/0

eps/0

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0

1 /0green/0.4

2 /0.3

blue/1.2

C.fsa

0

red/0.5

1 green/0.3 2 /0.8blue/0

yellow/0.6

3 4 /0green/0.4

5 /0.3

blue/1.2

6

eps/0

eps/0

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0

1 /0green/0.4

2 /0.3

blue/1.2

C.fsa

0

red/0.5

1 green/0.3 2 /0.8blue/0

yellow/0.6

3 4 /0green/0.4

5 /0.3

blue/1.2

6

eps/0

eps/0

21

21


Program: fsmconcat A.fsm B.fsm >C.fsm


Product - Illustration

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0

1 /0green/0.4

2 /0.3

blue/1.2

C.fsa

0

red/0.5

1 green/0.3 2 /0.8blue/0

yellow/0.6

3 4 /0green/0.4

5 /0.3

blue/1.2

6

eps/0

eps/0

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0

1 /0green/0.4

2 /0.3

blue/1.2

C.fsa

0

red/0.5

1 green/0.3 2 /0.8blue/0

yellow/0.6

3 4 /0green/0.4

5 /0.3

blue/1.2

6

eps/0

eps/0

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0

1 /0green/0.4

2 /0.3

blue/1.2

C.fsa

0

red/0.5

1 green/0.3

2 blue/0

yellow/0.63

eps/0.8

4 /0green/0.4

5 /0.3

blue/1.2

22

22


Program: fsmclosure B.fsm >C.fsm


Closure - Illustration

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0

1 /0green/0.4

2 /0.3

blue/1.2

C.fsa

0

red/0.5

1 green/0.3 2 /0.8blue/0

yellow/0.6

3 4 /0green/0.4

5 /0.3

blue/1.2

6

eps/0

eps/0

B.fsa

0

1 /0green/0.4

2 /0.3

blue/1.2

C.fsa

0

1 /0

green/0.4

2 /0.3

blue/1.2

eps/0

eps/0.3

3 /0eps/0

23

23



Rational operations




Search algorithms

This Lecture

24

24


Elementary Unary Operations

Reversal

Inversion

Projection

Linear-time complexity, lazy implementation (not for reversal).

[[T̃ ]](x, y) = [[T ]](x̃, ỹ)

[[T−1]](x, y) = [[T ]](y, x)

[[A]](x) =⊕

y

[[T ]](x, y)

25

25


Program: fsmreverse A.fsm >C.fsm


Reversal - Illustration

A.fsa

0

red/0.5

1 green/0.3

2 blue/0

yellow/0.6

3 /0green/1.2

4 /0.3

blue/2

C.fsa

0

4 eps/0

5

eps/0.33

green/1.2

blue/21 /0

red/0.5

2 green/0.3blue/0

yellow/0.6

A.fsa

0

red/0.5

1 green/0.3

2 blue/0

yellow/0.6

3 /0green/1.2

4 /0.3

blue/2

C.fsa

0

4 eps/0

5

eps/0.33

green/1.2

blue/21 /0

red/0.5

2 green/0.3blue/0

yellow/0.6

26

26


Program: fsminvert A.fsm >C.fsm


Inversion - Illustration

A.fst

0

red:bird/0.5

1 green:pig/0.3

2 /0.8blue:cat/0

yellow:dog/0.6

C.fst

0

bird:red/0.5

1 pig:green/0.3

2 /0.8cat:blue/0

dog:yellow/0.6

A.fst

0

red:bird/0.5

1 green:pig/0.3

2 /0.8blue:cat/0

yellow:dog/0.6

C.fst

0

bird:red/0.5

1 pig:green/0.3

2 /0.8cat:blue/0

dog:yellow/0.6

27

27


Program: fsmproject -1 T.fsm >A.fsm


Projection - Illustration

A.fst

0

red:bird/0.5

1 green:pig/0.3

2 /0.8blue:cat/0

yellow:dog/0.6

C.fst

0

bird:red/0.5

1 pig:green/0.3

2 /0.8cat:blue/0

dog:yellow/0.6

T.fst

0

red:bird/0.5

1 green:pig/0.3

2 /0.8blue:cat/0

yellow:dog/0.6

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

28

28



Rational operations




Search algorithms

This Lecture

29

29


Some Fundamental Binary Operations

Composition ( commutative)

Intersection ( commutative)

Difference ( unweighted and deterministic)

(K,⊕,⊗, 0, 1)

[[T1 ◦ T2]](x, y) =⊕

z

[[T1]](x, z) ⊗ [[T2]](z, y)

[[A1 ∩ A2]](x) = [[A1]](x) ⊗ [[A2]](x)

(K,⊕,⊗, 0, 1)

A2

[[A1 − A2]](x) = [[A1 ∩ A2]](x)

(Pereira and Riley, 1997; MM et al. 1996)

30

30


• Complexity and implementation:• quadratic complexity:

• path multiplicity in presence of ε-transitions: ε-filter;

• lazy implementation.

O((|E1| + |Q1|) (|E2| + |Q2|))

31


Program: fsmcompose A.fsm B.fsm >C.fsm


Composition - Illustration

0

1a:b/0.1

2

b:a/0.2

c:a/0.3

3/0.6

a:a/0.4

b:b/0.50 1

b:c/0.32/0.7

a:b/0.4

a:b/0.6

(0, 0) (1, 1)a:c/0.4

(1, 2)c:b/0.7

(3, 2)/1.3a:b/0.8

c:b/0.9

a:b/1

0

1a:b/0.1

2

b:a/0.2

c:a/0.3

3/0.6

a:a/0.4

b:b/0.50 1

b:c/0.32/0.7

a:b/0.4

a:b/0.6

(0, 0) (1, 1)a:c/0.4

(1, 2)c:b/0.7

(3, 2)/1.3a:b/0.8

c:b/0.9

a:b/1

0

1a:b/0.1

2

b:a/0.2

c:a/0.3

3/0.6

a:a/0.4

b:b/0.50 1

b:c/0.32/0.7

a:b/0.4

a:b/0.6

(0, 0) (1, 1)a:c/0.4

(1, 2)c:b/0.7

(3, 2)/1.3a:b/0.8

c:b/0.9

a:b/1

32

32


Multiplicity and ε-Transitions - Problem

33

Springer Handbook on Speech Processing and Speech Communication 14

0,0 1,1 1,2

2,1 2,2

3,1 3,2

4,3

a:d !:e

b:!

c:!

b:!

c:!

!:e

!:e

d:a

b :e

(x:x) (!1:!1)

(!1:!1)

(!1:!1)

(!2:!2)(!2:!2)

(!2:!2) (!2:!2)

(x:x)

(!2:!1)

0 1a:a

2b:!

3c:!

4d:d

0 1a:d

2!:e

3d:a

Figure 8: Redundant !-paths. A straightforward generalization of the !-free case could generate all the pathsfrom (1, 1) to (3, 2) when composing the two simple transducers on the right-hand side.

λ[p[π]]⊗w[π]⊗x. Thus, x can be viewed as the resid-ual weight at state q. The algorithm takes as input aweighted automaton A = (A, Q, I, F, E, λ, ρ) and,when it terminates, yields an equivalent deterministicweighted automatonA′ = (A, Q′, I ′, F ′, E′, λ′, ρ′).

The algorithm uses a queue S containing the setof states of the resulting automaton A′, yet to be ex-amined. The sets Q′, I ′, F ′, and E′ are initiallyempty. The queue discipline for S can be arbitrar-ily chosen and does not affect the termination of the

algorithm. The initial state set of A′ is I ′ = {i′}where i′ is the weighted set of the initial states of Awith the respective initial weights. Its initial weight

is 1 (lines 1-2). S originally contains only the subsetI ′ (line 3). Each time through the loop in lines 4-16,a new weighted subset p′ is dequeued from S (lines5-6). For each x labeling at least one of the transi-tions leaving a state p in the weighted subset p′, anew transition with input label x is constructed. Theweight w′ associated to that transition is the sum ofthe weights of all transitions in E[Q[p′]] labeled withx pre-⊗-multiplied by the residual weight v at eachstate p (line 8). The destination state of the transi-tion is the subset containing all the states q reachedby transitions in E[Q[p′]] labeled with x. The weightof each state q of the subset is obtained by taking the⊕-sum of the residual weights of the states p ⊗-timesthe weight of the transition from p leading to q andby dividing that by w′. The new subset q′ is insertedin the queue S when it is a new state (line 16). If anyof the states in the subset q′ is final, q′ is made a fi-

nal state and its final weight is obtained by summingthe final weights of all the final states in q′, pre-⊗-multiplied by their residual weight v (line 14-15).

The worst case complexity of determinization isexponential even in the unweighted case. However,

in many practical cases such as for weighted au-tomata used in large-vocabulary speech recognition,

this blow-up does not occur. It is also important to

notice that just like composition, determinization hasa natural lazy implementation in which only the tran-

sitions required by an application are expanded in the

result automaton.

Unlike in the unweighted case, determinization

does not halt on all input weighted automata. We

say that a weighted automaton A is determinizableif the determinization algorithm halts for the inputA.With a determinizable input, the algorithm outputs anequivalent deterministic weighted automaton.

The twins property for weighted automata char-

acterizes determinizable weighted automata undersome general conditions [Mohri, 1997]. Let A bea weighted automaton over a weakly left-divisible

semiring K. Two states q and q′ of A are said to besiblings if there are strings x and y in A∗ such thatboth q and q′ can be reached from I by paths labeledwith x and there is a cycle at q and a cycle at q′ bothlabeled with y. When K is a commutative and can-cellative semiring, two sibling states are said to betwins when for every string y:

w[P (q, y, q)] = w[P (q′, y, q′)]. (14)

Springer Handbook on Speech Processing and Speech Communication 14

0,0 1,1 1,2

2,1 2,2

3,1 3,2

4,3

a:d !:e

b:!

c:!

b:!

c:!

!:e

!:e

d:a

b :e

(x:x) (!1:!1)

(!1:!1)

(!1:!1)

(!2:!2)(!2:!2)

(!2:!2) (!2:!2)

(x:x)

(!2:!1)

0 1a:a

2b:!

3c:!

4d:d

0 1a:d

2!:e

3d:a

Figure 8: Redundant !-paths. A straightforward generalization of the !-free case could generate all the pathsfrom (1, 1) to (3, 2) when composing the two simple transducers on the right-hand side.

λ[p[π]]⊗w[π]⊗x. Thus, x can be viewed as the resid-ual weight at state q. The algorithm takes as input aweighted automaton A = (A, Q, I, F, E, λ, ρ) and,when it terminates, yields an equivalent deterministicweighted automatonA′ = (A, Q′, I ′, F ′, E′, λ′, ρ′).

The algorithm uses a queue S containing the setof states of the resulting automaton A′, yet to be ex-amined. The sets Q′, I ′, F ′, and E′ are initiallyempty. The queue discipline for S can be arbitrar-ily chosen and does not affect the termination of the

algorithm. The initial state set of A′ is I ′ = {i′}where i′ is the weighted set of the initial states of Awith the respective initial weights. Its initial weight

is 1 (lines 1-2). S originally contains only the subsetI ′ (line 3). Each time through the loop in lines 4-16,a new weighted subset p′ is dequeued from S (lines5-6). For each x labeling at least one of the transi-tions leaving a state p in the weighted subset p′, anew transition with input label x is constructed. Theweight w′ associated to that transition is the sum ofthe weights of all transitions in E[Q[p′]] labeled withx pre-⊗-multiplied by the residual weight v at eachstate p (line 8). The destination state of the transi-tion is the subset containing all the states q reachedby transitions in E[Q[p′]] labeled with x. The weightof each state q of the subset is obtained by taking the⊕-sum of the residual weights of the states p ⊗-timesthe weight of the transition from p leading to q andby dividing that by w′. The new subset q′ is insertedin the queue S when it is a new state (line 16). If anyof the states in the subset q′ is final, q′ is made a fi-

nal state and its final weight is obtained by summingthe final weights of all the final states in q′, pre-⊗-multiplied by their residual weight v (line 14-15).

The worst case complexity of determinization isexponential even in the unweighted case. However,

in many practical cases such as for weighted au-tomata used in large-vocabulary speech recognition,

this blow-up does not occur. It is also important to

notice that just like composition, determinization hasa natural lazy implementation in which only the tran-

sitions required by an application are expanded in the

result automaton.

Unlike in the unweighted case, determinization

does not halt on all input weighted automata. We

say that a weighted automaton A is determinizableif the determinization algorithm halts for the inputA.With a determinizable input, the algorithm outputs anequivalent deterministic weighted automaton.

The twins property for weighted automata char-

acterizes determinizable weighted automata undersome general conditions [Mohri, 1997]. Let A bea weighted automaton over a weakly left-divisible

semiring K. Two states q and q′ of A are said to besiblings if there are strings x and y in A∗ such thatboth q and q′ can be reached from I by paths labeledwith x and there is a cycle at q and a cycle at q′ bothlabeled with y. When K is a commutative and can-cellative semiring, two sibling states are said to betwins when for every string y:

w[P (q, y, q)] = w[P (q′, y, q′)]. (14)

33


Solution - Filter F for Composition

0

x:x

!2:!1

1 !1:!1

2

!2:!2

x:x

!1:!1

x:x

!2:!2

Replace T1 ◦ T2 with T̃1 ◦ F ◦ T̃2.

34

34


Program: fsmintersect A.fsm B.fsm >C.fsm


Intersection - Illustration

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0 /0 1 red/0.2

blue/0.6

green/0.4

2 /0.5yellow/1.3

C.fsa

0 1 red/0.7

2 green/0.7

3 /0.8blue/0.6

4 /1.3

yellow/1.9

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0 /0 1 red/0.2

blue/0.6

green/0.4

2 /0.5yellow/1.3

C.fsa

0 1 red/0.7

2 green/0.7

3 /0.8blue/0.6

4 /1.3

yellow/1.9

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0 /0 1 red/0.2

blue/0.6

green/0.4

2 /0.5yellow/1.3

C.fsa

0 1 red/0.7

2 green/0.7

3 /0.8blue/0.6

4 /1.3

yellow/1.9

35

35


Program: fsmdifference A.fsm B.fsm >C.fsm


Difference - Illustration

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0 /0 1 red/0.2

blue/0.6

green/0.4

2 /0.5yellow/1.3

C.fsa

0 1 red/0.7

2 green/0.7

3 /0.8blue/0.6

4 /1.3

yellow/1.9

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0 1 red

blue

green

2 yellow

C.fsa

0

1 red/0.5

3 green/0.3

2 red/0.5

4 /0.8blue/0

yellow/0.6

green/0.3

red/0.5

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

B.fsa

0 1 red

blue

green

2 yellow

C.fsa

0

1 red/0.5

3 green/0.3

2 red/0.5

4 /0.8blue/0

yellow/0.6

green/0.3

red/0.5

36

36



Rational operations




Search algorithms

This Lecture

37

37


Optimization Algorithms

Connection: removes non-accessible/non-coaccessible states.

ε-Removal: removes ε-transitions.

Determinization: creates equivalent deterministic machine.

Pushing: creates equivalent pushed/stochastic machine.

Minimization: creates equivalent minimal deterministic machine.

38

38


• Conditions: there are specific semiring conditionsfor the use of these algorithms, e.g., not all weighted automata or transducers can be determinized using the determinization algorithm.

39


Program: fsmconnect A.fsm >C.fsm


Connection - Illustration

Corinna Cortes and Mehryar Mohri - WFSTs in Computational Biology ISMB 200534page

• Program: fsmconnect A.fsm >C.fsm• Graphical representation:

Connection - Illustration

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

5

red/0

3 4 /0.2green/0.2

A.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

5

red/0

3 4 /0.2green/0.2

C.fsa

0

red/0.5

1 green/0.3

2 /0.8blue/0

yellow/0.6

40

40


Definition:

• Input: weighted transducer .• Output: equivalent weighted transducer with

all states connected.

Description:3. Depth-first search of from .

4. Mark accessible and coaccessible states.

5. Keep marked states and corresponding transitions.

Complexity: linear

Connection - Algorithm

T1

T2

T1 I1

O(|Q1| + |E1|).41

41


Program: fsmrmepsilon T.fsm >TP.fsm


ε-Removal - Illustration

0 1a:b/0.1

!:!/0.2

2

!:!/0.3

3b:a/0.4

b:b/0.5

!:!/0.64/0

b:a/0.7

b:a/0.9

!:!/1

a:a/0.8 0

10.2

30.8 2

0.3

0.6

1.6

1

4/0

0

4/0

a:a/1.6

b:a/1

1

a:b/0.1

b:a/1.7

3

b:a/0.4

2

b:b/0.7

a:a/1.4

b:a/2.3

b:a/1.5

b:b/0.5

a:a/0.8

b:a/1.7

b:a/0.9

b:a/0.7

0 1a:b/0.1

!:!/0.2

2

!:!/0.3

3b:a/0.4

b:b/0.5

!:!/0.64/0

b:a/0.7

b:a/0.9

!:!/1

a:a/0.8 0

10.2

30.8 2

0.3

0.6

1.6

1

4/0

0

4/0

a:a/1.6

b:a/1

1

a:b/0.1

b:a/1.7

3

b:a/0.4

2

b:b/0.7

a:a/1.4

b:a/2.3

b:a/1.5

b:b/0.5

a:a/0.8

b:a/1.7

b:a/0.9

b:a/0.7

0 1a:b/0.1

!:!/0.2

2

!:!/0.3

3b:a/0.4

b:b/0.5

!:!/0.64/0

b:a/0.7

b:a/0.9

!:!/1

a:a/0.8 0

10.2

30.8 2

0.3

0.6

1.6

1

4/0

0

4/0

a:a/1.6

b:a/1

1

a:b/0.1

b:a/1.7

3

b:a/0.4

2

b:b/0.7

a:a/1.4

b:a/2.3

b:a/1.5

b:b/0.5

a:a/0.8

b:a/1.7

b:a/0.9

b:a/0.7

42

42


Definition:

• Input: weighted transducer .• Output: equivalent WFST with no ε-transition.Description:• Computation of ε-closures.• Removal of εs.

• Complexity:• Acyclic • General case (tropical semiring):

ε-Removal - Algorithm

T1

T2

T! : O(|Q|2 + |Q||E|(T⊕ + T⊗)).

O(|Q||E| + |Q|2 log |Q|)

(MM, 2001)

43

43


Computation of ε-closures

Definition: for p in Q,

Problem formulation: all-pairs shortest-distance problem in (T reduced to its ε-transitions).

• closed semirings: generalization of Floyd-Warshall algorithm.

• k-closed semirings: generic sparse shortest-distance algorithm.

C[p] ={

(q, w) : q ∈ ![p], d[p, q] = w "= 0}

,

where d[p, q] =⊕

π∈P (p,",q)

w[π].

T!

44

44


Definition:

• Input: weighted automaton or transducer• Output: equivalent subsequential or deterministic

machine : has a unique initial state and no two transitions leaving the same state share the same input label.

Description:3. Generalization of subset construction: weighted

subsets , where s are remainder weights.

4. Computation of the weight of resulting transitions.

Determinization - Algorithm

T1

T2

{(q1, w1), . . . , (qn, wn)} wi

(MM,1997)

45

45


Semiring: weakly left divisible semirings.

Definition: T is determinizable when the determinization algorithm applies to T.

• All unweighted automata are determinizable.• All acyclic machines are determinizable.• Not all weighted automata or transducers are

determinizable.

• Characterization based on the twins property.Complexity: exponential, but lazy implementation.

Determinization - Conditions

46

46


Program: fsmdeterminize A.fsm >D.fsm


Determinization of Weighted Automata - Illustration

0

2a/1

b/4

1/0

a/3

b/1

3/0

b/1

b/3

b/3

b/5

{(0,0)}

{(1,2),(2,0)}/2a/1

{(1,0),(2,3)}/0

b/1

{(3,0)}/0

b/1

b/3

0

2a/1

b/4

1/0

a/3

b/1

3/0

b/1

b/3

b/3

b/5

{(0,0)}

{(1,2),(2,0)}/2a/1

{(1,0),(2,3)}/0

b/1

{(3,0)}/0

b/1

b/3

47

47


Program: fsmdeterminize T.fsm >D.fsm


Determinization of Weighted Transducers - Illustration

0

2a:a/0.1

1

a:c/0.5

a:b/0.2 4/0.7

b:eps/0.4

3

b:b/0.3

c:eps/0.7

a:b/0.6

a:eps/0.5

Determinization of Weighted Transducers – Illustration

• Program: fsmdeterminize T.fsm >D.fsm

• Graphical Representation:

{(0, eps, 0)}{(1, c, 0.4),

(2, a, 0)}

a:eps/0.1

{(0, b, 0)}a:a/0.2

{(3, b, 0),

(4, eps, .1)}

/(eps, 0.8)

b:a/0.3

{(4, eps, 0)}

/(eps, 0.7)c:c/1.1

a:b/0.1

a:b/0.5

a:b/0.6

48

48


Definition:

• Input: weighted automaton or transducer• Output: equivalent automaton or transducer

such that the longest common prefix of all outgoing paths be ε or such that the sum of the weights of all outgoing transitions be modulo the string or weight at the initial state.

Pushing - Algorithm

T1

T2

1

(MM,1997; 2004)

49

49


• Description:1. Single-source shortest distance computation:

for each state q,

2. Reweighting: for each transition e such that

d[q] =⊕

π∈P (q,F )

w[π].

d[p[e]] != 0,

w[e] ← (d[p[e]])−1(w[e] ⊗ d[n[e]])

50


• Conditions (automata case): weakly divisible semiring, zero-sum free semiring or automaton.

• Complexity:• automata case• acyclic case: linear• general case (tropical semiring):

• transducer case:

O(|Q| + |E|(T⊕ + T⊗)).

O(|Q| log |Q| + |E|).

O((|Pmax| + 1) |E|).

51


Program: fsmpush -ic A.fsm >P.fsm


• Tropical semiring:

Weight Pushing - Illustration

0

1

a/0

b/1

c/4

2

d/0

e/1

3/0

e/0

f/1

e/10

f/11

0

1

a/0

b/1

c/4

2

d/10

e/11

3/0

e/0

f/1

e/0

f/1

0

1

a/0

b/1

c/4

2

d/0

e/1

3/0

e/0

f/1

e/10

f/11

0

1

a/0

b/1

c/4

2

d/10

e/11

3/0

e/0

f/1

e/0

f/1

52

52


• Log semiring:

0

1

a/0

b/1

c/4

2

d/0

e/1

3/0

e/0

f/1

e/10

f/11

0

1

a/0.3266

b/1.3266

c/4.3266

2

d/10.326

e/11.326

3/-0.639

e/0.3132

f/1.3132

e/0.3132

f/1.3132

0

1

a/0

b/1

c/4

2

d/0

e/1

3/0

e/0

f/1

e/10

f/11

0

1

a/0.3266

b/1.3266

c/4.3266

2

d/10.326

e/11.326

3/-0.639

e/0.3132

f/1.3132

e/0.3132

f/1.3132

53


Program: fsmpush -il T.fsm >P.fsm


Label Pushing - Illustration

0

a:a

1c: !

b:d

2a: !

3b:!

4

c:d

d:!

a:d

5a: !

6f:d

0

a:a

1c:d

b:!2

a:d

3b:!

4

c: !

d:d

a:d

5a: !

6f: !

0

a:a

1c: !

b:d

2a: !

3b:!

4

c:d

d:!

a:d

5a: !

6f:d

0

a:a

1c:d

b:!2

a:d

3b:!

4

c: !

d:d

a:d

5a: !

6f: !

54

54


Definition:

• Input: deterministic weighted automaton or transducer .

• Output: equivalent deterministic automaton or transducer with the minimal number of states and transitions.

Description:

• Canonical representation: use pushing or other algorithm to standardize input automata.

• Automata minimization: encode pairs (label, weight) as labels and use classical unweighted minimization algorithm.

Minimization - Algorithm

T1

T2

(MM,1997)

55

55


• Complexity:

• Automata case

• acyclic case: linear,

• general case (tropical semiring):

• Transducer case

• acyclic case:

• general case (tropical semiring):

O(|Q| + |E|(T⊕ + T⊗)).

O(|E| log |Q|).

O(S + |Q| + |E| (|Pmax| + 1)).

O(S + |Q| + |E| (log |Q| + |Pmax|)).

56


Program: fsmminimize D.fsm >M.fsm


Minimization - Illustration

0 1a/0

b/1

d/0

2b/1

4

a/2

3

c/2

5

c/3

d/3

6

e/2

c/1

7/0e/1

d/4 e/3

0 1a/0

b/1

d/0

2b/0

4

a/3

3

c/0

5

c/0

d/6

6

e/0

c/1

7/6e/0

d/6 e/0

0 1a/0

b/1

d/0

2

b/0

a/3

3

c/0

d/6 6e/0

c/1

7/6e/0

0 1a/0

b/1

d/0

2b/1

4

a/2

3

c/2

5

c/3

d/3

6

e/2

c/1

7/0e/1

d/4 e/3

0 1a/0

b/1

d/0

2b/0

4

a/3

3

c/0

5

c/0

d/6

6

e/0

c/1

7/6e/0

d/6 e/0

0 1a/0

b/1

d/0

2

b/0

a/3

3

c/0

d/6 6e/0

c/1

7/6e/0

57

57


Definition:

• Input: deterministic weighted automata A and B.• Output: TRUE iff A and B equivalent.Description

• Canonical representation: use pushing or other algorithm to standardize input automata.

• Automata minimization: encode pairs (label, weight) as labels and use classical algorithm for testing the equivalence of unweighted automata.

Complexity: (second stage is quasi-linear)

Equivalence - Algorithm

O(|E1| + |E2| + |Q1| log |Q1| + |Q2| log |Q2|).

(MM,1997):

58

58


Program: fsmequiv [-v] D.fsm M.fsm


Equivalence - Illustration

D.fsa

0 1 red/0.3

2 /0.3blue/0.7

3 /0.4

yellow/0.9

M.fsa

0 1 red/0

2 /1.3blue/0

yellow/0.3

D.fsa

0 1 red/0.3

2 /0.3blue/0.7

3 /0.4

yellow/0.9

M.fsa

0 1 red/0

2 /1.3blue/0

yellow/0.3

59

59



Rational operations




Search algorithms

This Lecture

60

60


Program: fsmbestpath [-n N] A.fsm >C.fsm


Single-Source Shortest-Distance Algorithms - Illustration

A.fsa

0

1 red/0.5

3 green/0.3

2 red/0.5

4 /0.8blue/0

yellow/0.6

green/0.3

red/0.5

C.fsa

0 1 green/0.3

2 /0.8blue/0

A.fsa

0

1 red/0.5

3 green/0.3

2 red/0.5

4 /0.8blue/0

yellow/0.6

green/0.3

red/0.5

C.fsa

0 1 green/0.3

2 /0.8blue/0

61

61


Program: fsmprune -c1.0 A.fsm >C.fsm


Pruning - Illustration

A.fsa

0

1 red/0.5

3 green/0.3

2 red/0.5

4 /0.8blue/0

yellow/0.6

green/0.3

red/0.5

C.fsa

0 1 green/0.3

2 /0.8blue/0

A.fsa

0

1 red/0.5

3 green/0.3

2 red/0.5

4 /0.8blue/0

yellow/0.6

green/0.3

C.fsa

0 1 green/0.3

2 /0.8blue/0

yellow/0.6

62

62


Summary

FSM Library:

• weighted finite-state transducers (semirings);• elementary unary operations (e.g., reversal);• rational operations (sum, product, closure);• fundamental binary operations (e.g., composition);• optimization algorithms (e.g., ε-removal,

determinization, minimization);

• search algorithms (e.g., shortest-distance algorithms, n-best paths algorithms, pruning).

63

63


References

• Cyril Allauzen and Mehryar Mohri. Efficient Algorithms for Testing the Twins Property. Journal of Automata, Languages and Combinatorics, 8(2):117-144, 2003.

• Cyril Allauzen, Mehryar Mohri, and Brian Roark. The Design Principles and Algorithms of a Weighted Grammar Library. International Journal of Foundations of Computer Science, 16(3):403-421, 2005.

• Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri. OpenFst: a general and efficient weighted finite-state transducer library. In Proceedings of the Ninth International Conference on Implementation and Application of Automata, (CIAA 2007), volume 4783 of Lecture Notes in Computer Science, pages 11-23. Springer, Berlin, 2007.

• Mehryar Mohri. Finite-State Transducers in Language and Speech Processing. Computational Linguistics, 23:2, 1997.

• Mehryar Mohri. Weighted Grammar Tools: the GRM Library. In Robustness in Language and Speech Technology. pages 165-186. Kluwer Academic Publishers, The Netherlands, 2001.

• Mehryar Mohri. Statistical Natural Language Processing. In M. Lothaire, editor, Applied Combinatorics on Words. Cambridge University Press, 2005.

64

64


References

• Mehryar Mohri. Generic Epsilon-Removal and Input Epsilon-Normalization Algorithms for Weighted Transducers. International Journal of Foundations of Computer Science, 13(1):129-143, 2002.

• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. The Design Principles of a Weighted Finite-State Transducer Library. Theoretical Computer Science, 231:17-32, January 2000.

• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Weighted Automata in Text and Speech Processing. In Proceedings of the 12th biennial European Conference on Artificial Intelligence (ECAI-96), Workshop on Extended finite state models of language. Budapest, Hungary, 1996. John Wiley and Sons, Chichester.

• Mehryar Mohri and Michael Riley. A Weight Pushing Algorithm for Large Vocabulary Speech Recognition. In Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech'01). Aalborg, Denmark, September 2001.

• Fernando Pereira and Michael Riley. Finite State Language Processing, chapter Speech Recognition by Composition of Weighted Finite Automata. The MIT Press, 1997.

65

65

speech recognition - nyu computer sciencemohri/asr07/lecture_2.pdf · 2007. 10. 29. · mehryar...

Documents