dynamic time warping & search

33
Dynamic Time Warping & Search Dynamic time warping Search Graph search algorithms Dynamic programming algorithms 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 1

Upload: duongtuong

Post on 10-Feb-2017

243 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Dynamic Time Warping & Search

Dynamic Time Warping & Search

• Dynamic time warping

• Search

– Graph search algorithms

– Dynamic programming algorithms

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 1

rahkuma
Lecture # 9 Session 2003
Page 2: Dynamic Time Warping & Search

Word-Based Template Matching

� Decision Rule

Spoken �

Word Reference Templates

Pattern Similarity

Output Word Word

• Whole word representation:

– No explicit concept of sub-word units (e.g., phones)

– No across-word sharing

• Used for both isolated- and connected-word recognition

• Popular in late 1970s to mid 1980s6.345 Automatic Speech Recognition Dynamic Time Warping & Search 2

� Feature Measurement

Page 3: Dynamic Time Warping & Search

Template Matching Mechanism

• Test pattern, T , and reference patterns, {R1, . . . , RV }, are represented by sequences of feature measurements

• Pattern similarity is determined by aligning test pattern, T , with reference pattern, Rv , with distortion D(T , Rv )

• Decision rule chooses reference pattern, R∗ , with smallest alignment distortion D(T , R∗)

R∗ = arg min D(T , Rv ) v

• Dynamic time warping (DTW) is used to compute the best possible alignment warp, φv , between T and Rv , and the associated distortion D(T , Rv )

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 3

Page 4: Dynamic Time Warping & Search

Alignment Example

m m

M

Reference

M

1 1

0

Warp

n1 N

Test

n0 1 N

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 4

Page 5: Dynamic Time Warping & Search

Digit Alignment Examples

Match Mismatch

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 5

Page 6: Dynamic Time Warping & Search

Dynamic Time Warping (DTW)

• Objective: an optimal alignment between variable length sequences T = {t1, . . . , tN } and R = {r1, . . . , rM }

• The overall distortion D(T , R) is based on a sum of local distances between elements d(ti , rj )

• A particular alignment warp, φ, aligns T and R via a point-to-point mapping, φ = (φt, φr ), of length Kφ

tφt (k) ⇐⇒ rφr (k) 1 ≤ k ≤ Kφ

• The optimal alignment minimizes overall distortion

D(T , R) = min Dφ(T , R) φ

Kφ1 � Dφ(T , R) = d(tφt (k), rφr (k))mk

Mφ k=1

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 6

Page 7: Dynamic Time Warping & Search

DTW Issues

• Endpoint constraints:

φt (1) = φr (1) = 1 φt (K) = N φr (K) = M

• Monotonicity:

φt (k + 1) ≥ φt (k) φr (k + 1) ≥ φr (k)

• Path weights, mk, can influence shape of optimal path

• Path normalization factor, Mφ, allows comparison between different warps (e.g., with different lengths)

Mφ = mk k=1

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 7

Page 8: Dynamic Time Warping & Search

DTW Issues: Local Continuity

m [n,m] m [n,m]

TYPE I m - 1 TYPE III m - 1

m - 2 m - 2

n - 2 n - 1 n n - 2 n - 1 n

m [n,m] m [n,m]

TYPE II m - 1 TYPE IV m - 1

m - 2 m - 2

n - 2 n - 1 n n - 2 n - 1 n

Local constraints determine alignment flexibility

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 8

Page 9: Dynamic Time Warping & Search

DTW Issues: Global Constraints

slope = 2 (1,M)

1slope =

2

Legal range

(1,1) (N,1)

(N,M)

1slope =

2

slope = 2

Local constraints exclude portions of search space 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 9

Page 10: Dynamic Time Warping & Search

Computing DTW Alignment

5 3 4 4 1

1 3 4 4 5

4 3 3 2 2

1 1 3

3 3 2

1 3

2 4

C B

D A G

F E

H

m

2

1

1

[m,n]

m-1

m-2n-2 n-1 n

S

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 10

Page 11: Dynamic Time Warping & Search

Graph Representations of Search Space

• Search spaces can be represented as directed graphs

S

� A

� B

� C

� D

� �

E

� �

� F

� � G

� �

� H

• Paths through a graph can be represented with a tree

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 11

Page 12: Dynamic Time Warping & Search

Search Space Tree

S

� A

� �

� B

� � �

� C

� � E F E D G F G

� � � � � � �H H F H H HH

H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 12

Page 13: Dynamic Time Warping & Search

Graph Search Algorithms

• Iterative methods using a queue to store partial paths

– On each iteration the top partial path is removed from the queue and is extended one level

– New extensions are put back into the queue

– Search is complete when goal is reached

• Depth of queue is potentially unbounded

• Weighted graphs can be searched to find the best path

• Admissible algorithms guarantee finding the best path

• Many speech-based search problems can be configured to proceed time-synchronously

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 13

Page 14: Dynamic Time Warping & Search

Depth First Search

• Searches space by pursuing one path at a time

• Path extensions are put on top of queue

• Queue is not reordered or pruned

• Not well suited for spaces with long dead-end paths

• Not generally used to find the best path

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 14

Page 15: Dynamic Time Warping & Search

� � � �

� � �

Depth First Search Example

S

� A

� B

� C

� E

� F

S

� A

� E

� F

� E

� D

� G

� B

� E

� D

� G

� F

� G

� F

� C

� F

� G � G

H H � � � � � � �

H H H H F F H H H H H H

H H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 15

Page 16: Dynamic Time Warping & Search

Breadth First Search

• Searches space by pursuing all paths in parallel

• Path extensions are put on bottom of queue

• Queue is not reordered or pruned

• Queue can grow rapidly in spaces with many paths

• Not generally used to find the best path

• Can be made much more effective with pruning

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 16

Page 17: Dynamic Time Warping & Search

� � � � � � �

� �

Breadth First Search Example

S

� A

� B

� C

� E

� F

� E

� D

� G

� F

� G

S

� A

� E

� F

� B

� E

� G

� C

� F

� G

� D

H H � � � � � � �

H H H H F F H H H H H H

H H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 17

Page 18: Dynamic Time Warping & Search

Best First Search

• Used to search a weighted graph

• Uses greedy or step-wise optimal criterion, whereby each iteration expands the current best path

• On each iteration, the queue is resorted according to the cumulative score of each partial path

• If path scores exhibit monotonic behavior, (e.g., d(ti, rj ) ≥ 0), search can terminate when a complete path has a better score than all active partial paths

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 18

Page 19: Dynamic Time Warping & Search

Tree Representation (with node scores)

S 1

� A 1

� �

� B 2

� � �

� C 6

� � E F E D G F G3 6 3 1 2 3 1

� � � � � � � 2 H 1 H 2 F 3 H 1 H 1 H 1

1

H

H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 19

Page 20: Dynamic Time Warping & Search

Tree Representation (with cumulative scores)

S 1

� A 2

� �

� B 3

� � �

� C 7

� � E F E D G F G5 8 6 4 5 10 8

� � � � � � � 7 H 9 H 8 F 7 H 6 H 11 H 9

8

H

H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 20

Page 21: Dynamic Time Warping & Search

5

� 6

� � � �

Best First Search Example

7

5 F 8 E 6 D 4 G G 5

7 8 7 6

S 1

� A 2

� B 3

� C

� � � � �

S 1

� B 3

� E

H H F H H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 21

Page 22: Dynamic Time Warping & Search

Pruning Partial Paths

• Both greedy and dynamic programming algorithms can take advantage of optimal substructure:

– Let φ(i, j) be the best path between nodes i and j

– If k is a node in φ(i, j):

φ(i, j) = {φ(i, k), φ(k, j)}

– Let ϕ(i, j) be the cost of φ(i, j)

ϕ(i, j) = min(ϕ(i, k) + ϕ(k, j)) k

• Solutions to subproblems need only be computed once

• Sub-optimal partial paths can be discarded while maintaining admissibility of search

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 22

Page 23: Dynamic Time Warping & Search

8 6 5

� 6

� � �

Best First Search with Pruning

7

5 F F 8 E E 6 D 4 G G 5

7 7 6

S 1

� A 2

� B 3

� C

� � � � � ��

S 1

� B 3

� E

H F H H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 23

Page 24: Dynamic Time Warping & Search

ˆ

ˆ

ˆ

Estimating Future Scores

• Partial path scores, ϕ(1, i), can be augmented with future estimates, ϕ(i), of the remaining cost

ϕφ = ϕ(1, i) + ϕ(i)

• If ϕ(i) is an underestimate of the remaining cost, additional paths can be pruned while maintaining admissibility of search

• A∗search uses

– Best-first search strategy

– Pruning

– Future estimates

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 24

Page 25: Dynamic Time Warping & Search

Tree Representation (with future estimates)

S 5

� A 5

� �

� B 6

� � �

� C 9

� � E F E D G F G7 9 8 6 6 10 9

� � � � � � � 7 H 9 H 8 F 8 H 6 H 11 H 9

8

H

H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 25

Page 26: Dynamic Time Warping & Search

9 8 6

� 6

� �

A∗Search Example

9

7 F F 9 E E 8 D 6 G G 6

8 6

S 5

� A 5

� B 6

� C

� � � � � ��

S 5

� B 6

� E

F H H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 26

Page 27: Dynamic Time Warping & Search

N -Best Search

• Used to compute top N paths

– Can be re-scored by more sophisticated techniques

– Typically used at the sentence level

• Can use modified A∗ search to rank paths

– No pruning of partial paths

– Completed paths are removed from queue

– Can use a threshold to prune paths, and still identify admissibility violations

– Can also be used to produce a graph

• Alternative methods can be used to compute N -best outputs (e.g., asynchronous DP)

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 27

Page 28: Dynamic Time Warping & Search

7 8 6 6

� 8

� 6

� 7

� 8

� � � �

N -Best Search Example

S 5

� A 5

� B 6

� C

� � � � �

S 5

� B 6

� A 5

� � � E E

H H

9

7 F 9 E E 8 D D 6 G G 6

7 H H 8 F F 8 H H 6

8H 8 H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 28

Page 29: Dynamic Time Warping & Search

Dynamic Programming (DP)

• DP algorithms do not employ a greedy strategy

• DP algorithms typically take advantage of optimal substructure and overlapping subproblems by arranging search to solve each subproblem only once

• Can be implemented efficiently:

– Node j retains only best path cost of all ϕ(i, j)

– Previous best node id needed to recover best path

• Can be time-synchronous or asynchronous

• DTW and Viterbi are time-synchronous searches and look like breadth-first with pruning

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 29

Page 30: Dynamic Time Warping & Search

Time-Synchronous DP Example

S

� A

� B

� E

� D

� C

��

� F

�� G�

��

� H

��S � B

� G

� H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 30

Page 31: Dynamic Time Warping & Search

Inadmissible Search Variations

• Can use a beam width to prune current hypotheses

– Beam width can be static or dynamic based on relative score

• Can use an approximation to a lower bound on A∗lookahead for N -best computation

• Search is inadmissible, but may be useful in practice

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 31

Page 32: Dynamic Time Warping & Search

Beam Search Example

S

� A

� B

� E

� D

� C

� C

�� F �

� G

��

� H

��S � B

� G

� H

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 32

Page 33: Dynamic Time Warping & Search

References

• Cormen, Leiserson, Rivest, Introduction to Algorithms, 2nd

Edition, MIT Press, 2001.

• Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001.

• Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997.

• Winston, Artificial Intelligence, 3rd Edition, Addison-Wesley, 1993.

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 33