dynamic time warping & search

Dynamic Time Warping & Search

• Dynamic time warping

• Search

– Graph search algorithms

– Dynamic programming algorithms

6.345 Automatic Speech Recognition Dynamic Time Warping & Search 1

rahkuma

Lecture # 9 Session 2003

�

Word-Based Template Matching

� Decision Rule

Spoken �

Word Reference Templates

Pattern Similarity

Output Word Word

• Whole word representation:

– No explicit concept of sub-word units (e.g., phones)

– No across-word sharing

• Used for both isolated- and connected-word recognition

• Popular in late 1970s to mid 1980s6.345 Automatic Speech Recognition Dynamic Time Warping & Search 2

� Feature Measurement

�

Template Matching Mechanism

• Test pattern, T , and reference patterns, {R1, . . . , RV }, are represented by sequences of feature measurements

• Pattern similarity is determined by aligning test pattern, T , with reference pattern, Rv , with distortion D(T , Rv )

• Decision rule chooses reference pattern, R∗ , with smallest alignment distortion D(T , R∗)

R∗ = arg min D(T , Rv ) v

• Dynamic time warping (DTW) is used to compute the best possible alignment warp, φv , between T and Rv , and the associated distortion D(T , Rv )


Alignment Example

m m

M

Reference

M

1 1

0

Warp

n1 N

Test

n0 1 N


Digit Alignment Examples

Match Mismatch


Dynamic Time Warping (DTW)

• Objective: an optimal alignment between variable length sequences T = {t1, . . . , tN } and R = {r1, . . . , rM }

• The overall distortion D(T , R) is based on a sum of local distances between elements d(ti , rj )

• A particular alignment warp, φ, aligns T and R via a point-to-point mapping, φ = (φt, φr ), of length Kφ

tφt (k) ⇐⇒ rφr (k) 1 ≤ k ≤ Kφ

• The optimal alignment minimizes overall distortion

D(T , R) = min Dφ(T , R) φ

Kφ1 � Dφ(T , R) = d(tφt (k), rφr (k))mk

Mφ k=1


�

DTW Issues

• Endpoint constraints:

φt (1) = φr (1) = 1 φt (K) = N φr (K) = M

• Monotonicity:

φt (k + 1) ≥ φt (k) φr (k + 1) ≥ φr (k)

• Path weights, mk, can influence shape of optimal path

• Path normalization factor, Mφ, allows comparison between different warps (e.g., with different lengths)

Kφ

Mφ = mk k=1


DTW Issues: Local Continuity

m [n,m] m [n,m]

TYPE I m - 1 TYPE III m - 1

m - 2 m - 2

n - 2 n - 1 n n - 2 n - 1 n

m [n,m] m [n,m]

TYPE II m - 1 TYPE IV m - 1

m - 2 m - 2

n - 2 n - 1 n n - 2 n - 1 n

Local constraints determine alignment flexibility


DTW Issues: Global Constraints

slope = 2 (1,M)

1slope =

2

Legal range

(1,1) (N,1)

(N,M)

1slope =

2

slope = 2

Local constraints exclude portions of search space 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 9

Computing DTW Alignment

5 3 4 4 1

1 3 4 4 5

4 3 3 2 2

1 1 3

3 3 2

1 3

2 4

C B

D A G

F E

H

m

2

1

1

[m,n]

m-1

m-2n-2 n-1 n

S


Graph Representations of Search Space

• Search spaces can be represented as directed graphs

S

� A

� B

� C

� D

� �

E

� �

� F

� � G

� �

� H

• Paths through a graph can be represented with a tree


�

Search Space Tree

S

� A

� �

� B

� � �

� C

� � E F E D G F G

� � � � � � �H H F H H HH

H


Graph Search Algorithms

• Iterative methods using a queue to store partial paths

– On each iteration the top partial path is removed from the queue and is extended one level

– New extensions are put back into the queue

– Search is complete when goal is reached

• Depth of queue is potentially unbounded

• Weighted graphs can be searched to find the best path

• Admissible algorithms guarantee finding the best path

• Many speech-based search problems can be configured to proceed time-synchronously


Depth First Search

• Searches space by pursuing one path at a time

• Path extensions are put on top of queue

• Queue is not reordered or pruned

• Not well suited for spaces with long dead-end paths

• Not generally used to find the best path


� � � �

�

� � �

�

Depth First Search Example

S

� A

� B

� C

� E

� F

S

� A

� E

� F

� E

� D

� G

� B

� E

� D

� G

� F

� G

� F

� C

� F

� G � G

H H � � � � � � �

H H H H F F H H H H H H

H H


Breadth First Search

• Searches space by pursuing all paths in parallel

• Path extensions are put on bottom of queue

• Queue is not reordered or pruned

• Queue can grow rapidly in spaces with many paths

• Not generally used to find the best path

• Can be made much more effective with pruning


� � � � � � �

� �

Breadth First Search Example

S

� A

� B

� C

� E

� F

� E

� D

� G

� F

� G

S

� A

� E

� F

� B

� E

� G

� C

� F

� G

� D

H H � � � � � � �

H H H H F F H H H H H H

H H


Best First Search

• Used to search a weighted graph

• Uses greedy or step-wise optimal criterion, whereby each iteration expands the current best path

• On each iteration, the queue is resorted according to the cumulative score of each partial path

• If path scores exhibit monotonic behavior, (e.g., d(ti, rj ) ≥ 0), search can terminate when a complete path has a better score than all active partial paths


�

Tree Representation (with node scores)

S 1

� A 1

� �

� B 2

� � �

� C 6

� � E F E D G F G3 6 3 1 2 3 1

� � � � � � � 2 H 1 H 2 F 3 H 1 H 1 H 1

1

H

H


�

Tree Representation (with cumulative scores)

S 1

� A 2

� �

� B 3

� � �

� C 7

� � E F E D G F G5 8 6 4 5 10 8

� � � � � � � 7 H 9 H 8 F 7 H 6 H 11 H 9

8

H

H


5

� 6

� � � �

Best First Search Example

7

5 F 8 E 6 D 4 G G 5

7 8 7 6

S 1

� A 2

� B 3

� C

� � � � �

S 1

� B 3

� E

H H F H H


Pruning Partial Paths

• Both greedy and dynamic programming algorithms can take advantage of optimal substructure:

– Let φ(i, j) be the best path between nodes i and j

– If k is a node in φ(i, j):

φ(i, j) = {φ(i, k), φ(k, j)}

– Let ϕ(i, j) be the cost of φ(i, j)

ϕ(i, j) = min(ϕ(i, k) + ϕ(k, j)) k

• Solutions to subproblems need only be computed once

• Sub-optimal partial paths can be discarded while maintaining admissibility of search


8 6 5

� 6

� � �

Best First Search with Pruning

7

5 F F 8 E E 6 D 4 G G 5

7 7 6

S 1

� A 2

� B 3

� C

� � � � � ��

S 1

� B 3

� E

H F H H


ˆ

ˆ

ˆ

Estimating Future Scores

• Partial path scores, ϕ(1, i), can be augmented with future estimates, ϕ(i), of the remaining cost

ϕφ = ϕ(1, i) + ϕ(i)

• If ϕ(i) is an underestimate of the remaining cost, additional paths can be pruned while maintaining admissibility of search

• A∗search uses

– Best-first search strategy

– Pruning

– Future estimates


�

Tree Representation (with future estimates)

S 5

� A 5

� �

� B 6

� � �

� C 9

� � E F E D G F G7 9 8 6 6 10 9

� � � � � � � 7 H 9 H 8 F 8 H 6 H 11 H 9

8

H

H


9 8 6

� 6

� �

A∗Search Example

9

7 F F 9 E E 8 D 6 G G 6

8 6

S 5

� A 5

� B 6

� C

� � � � � ��

S 5

� B 6

� E

F H H


N -Best Search

• Used to compute top N paths

– Can be re-scored by more sophisticated techniques

– Typically used at the sentence level

• Can use modified A∗ search to rank paths

– No pruning of partial paths

– Completed paths are removed from queue

– Can use a threshold to prune paths, and still identify admissibility violations

– Can also be used to produce a graph

• Alternative methods can be used to compute N -best outputs (e.g., asynchronous DP)


7 8 6 6

� 8

� 6

� 7

� 8

�

� � � �

�

N -Best Search Example

S 5

� A 5

� B 6

� C

� � � � �

S 5

� B 6

�

� A 5

� � � E E

H H

9

7 F 9 E E 8 D D 6 G G 6

7 H H 8 F F 8 H H 6

8H 8 H


Dynamic Programming (DP)

• DP algorithms do not employ a greedy strategy

• DP algorithms typically take advantage of optimal substructure and overlapping subproblems by arranging search to solve each subproblem only once

• Can be implemented efficiently:

– Node j retains only best path cost of all ϕ(i, j)

– Previous best node id needed to recover best path

• Can be time-synchronous or asynchronous

• DTW and Viterbi are time-synchronous searches and look like breadth-first with pruning


Time-Synchronous DP Example

S

� A

� B

�

� E

�

� D

� C

��

� F

�

�

�� G�

��

� H

��S � B

� G

� H


Inadmissible Search Variations

• Can use a beam width to prune current hypotheses

– Beam width can be static or dynamic based on relative score

• Can use an approximation to a lower bound on A∗lookahead for N -best computation

• Search is inadmissible, but may be useful in practice


Beam Search Example

S

� A

� B

�

� E

�

� D

� C

� C

�� F �

� G

��

� H

��S � B

� G

� H


References

• Cormen, Leiserson, Rivest, Introduction to Algorithms, 2nd

Edition, MIT Press, 2001.

• Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001.

• Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997.

• Winston, Artificial Intelligence, 3rd Edition, Addison-Wesley, 1993.


dynamic time warping & search

Documents