dynamic time warping & search
TRANSCRIPT
Dynamic Time Warping & Search
• Dynamic time warping
• Search
– Graph search algorithms
– Dynamic programming algorithms
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 1
�
Word-Based Template Matching
� Decision Rule
Spoken �
Word Reference Templates
Pattern Similarity
Output Word Word
• Whole word representation:
– No explicit concept of sub-word units (e.g., phones)
– No across-word sharing
• Used for both isolated- and connected-word recognition
• Popular in late 1970s to mid 1980s6.345 Automatic Speech Recognition Dynamic Time Warping & Search 2
� Feature Measurement
�
Template Matching Mechanism
• Test pattern, T , and reference patterns, {R1, . . . , RV }, are represented by sequences of feature measurements
• Pattern similarity is determined by aligning test pattern, T , with reference pattern, Rv , with distortion D(T , Rv )
• Decision rule chooses reference pattern, R∗ , with smallest alignment distortion D(T , R∗)
R∗ = arg min D(T , Rv ) v
• Dynamic time warping (DTW) is used to compute the best possible alignment warp, φv , between T and Rv , and the associated distortion D(T , Rv )
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 3
Alignment Example
m m
M
Reference
M
1 1
0
Warp
n1 N
Test
n0 1 N
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 4
Digit Alignment Examples
Match Mismatch
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 5
Dynamic Time Warping (DTW)
• Objective: an optimal alignment between variable length sequences T = {t1, . . . , tN } and R = {r1, . . . , rM }
• The overall distortion D(T , R) is based on a sum of local distances between elements d(ti , rj )
• A particular alignment warp, φ, aligns T and R via a point-to-point mapping, φ = (φt, φr ), of length Kφ
tφt (k) ⇐⇒ rφr (k) 1 ≤ k ≤ Kφ
• The optimal alignment minimizes overall distortion
D(T , R) = min Dφ(T , R) φ
Kφ1 � Dφ(T , R) = d(tφt (k), rφr (k))mk
Mφ k=1
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 6
�
DTW Issues
• Endpoint constraints:
φt (1) = φr (1) = 1 φt (K) = N φr (K) = M
• Monotonicity:
φt (k + 1) ≥ φt (k) φr (k + 1) ≥ φr (k)
• Path weights, mk, can influence shape of optimal path
• Path normalization factor, Mφ, allows comparison between different warps (e.g., with different lengths)
Kφ
Mφ = mk k=1
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 7
DTW Issues: Local Continuity
m [n,m] m [n,m]
TYPE I m - 1 TYPE III m - 1
m - 2 m - 2
n - 2 n - 1 n n - 2 n - 1 n
m [n,m] m [n,m]
TYPE II m - 1 TYPE IV m - 1
m - 2 m - 2
n - 2 n - 1 n n - 2 n - 1 n
Local constraints determine alignment flexibility
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 8
DTW Issues: Global Constraints
slope = 2 (1,M)
1slope =
2
Legal range
(1,1) (N,1)
(N,M)
1slope =
2
slope = 2
Local constraints exclude portions of search space 6.345 Automatic Speech Recognition Dynamic Time Warping & Search 9
Computing DTW Alignment
5 3 4 4 1
1 3 4 4 5
4 3 3 2 2
1 1 3
3 3 2
1 3
2 4
C B
D A G
F E
H
m
2
1
1
[m,n]
m-1
m-2n-2 n-1 n
S
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 10
Graph Representations of Search Space
• Search spaces can be represented as directed graphs
S
� A
� B
� C
� D
� �
E
� �
� F
� � G
� �
� H
• Paths through a graph can be represented with a tree
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 11
�
Search Space Tree
S
� A
� �
� B
� � �
� C
� � E F E D G F G
� � � � � � �H H F H H HH
H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 12
Graph Search Algorithms
• Iterative methods using a queue to store partial paths
– On each iteration the top partial path is removed from the queue and is extended one level
– New extensions are put back into the queue
– Search is complete when goal is reached
• Depth of queue is potentially unbounded
• Weighted graphs can be searched to find the best path
• Admissible algorithms guarantee finding the best path
• Many speech-based search problems can be configured to proceed time-synchronously
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 13
Depth First Search
• Searches space by pursuing one path at a time
• Path extensions are put on top of queue
• Queue is not reordered or pruned
• Not well suited for spaces with long dead-end paths
• Not generally used to find the best path
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 14
� � � �
�
� � �
�
Depth First Search Example
S
� A
� B
� C
� E
� F
S
� A
� E
� F
� E
� D
� G
� B
� E
� D
� G
� F
� G
� F
� C
� F
� G � G
H H � � � � � � �
H H H H F F H H H H H H
H H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 15
Breadth First Search
• Searches space by pursuing all paths in parallel
• Path extensions are put on bottom of queue
• Queue is not reordered or pruned
• Queue can grow rapidly in spaces with many paths
• Not generally used to find the best path
• Can be made much more effective with pruning
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 16
� � � � � � �
� �
Breadth First Search Example
S
� A
� B
� C
� E
� F
� E
� D
� G
� F
� G
S
� A
� E
� F
� B
� E
� G
� C
� F
� G
� D
H H � � � � � � �
H H H H F F H H H H H H
H H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 17
Best First Search
• Used to search a weighted graph
• Uses greedy or step-wise optimal criterion, whereby each iteration expands the current best path
• On each iteration, the queue is resorted according to the cumulative score of each partial path
• If path scores exhibit monotonic behavior, (e.g., d(ti, rj ) ≥ 0), search can terminate when a complete path has a better score than all active partial paths
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 18
�
Tree Representation (with node scores)
S 1
� A 1
� �
� B 2
� � �
� C 6
� � E F E D G F G3 6 3 1 2 3 1
� � � � � � � 2 H 1 H 2 F 3 H 1 H 1 H 1
1
H
H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 19
�
Tree Representation (with cumulative scores)
S 1
� A 2
� �
� B 3
� � �
� C 7
� � E F E D G F G5 8 6 4 5 10 8
� � � � � � � 7 H 9 H 8 F 7 H 6 H 11 H 9
8
H
H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 20
5
� 6
� � � �
Best First Search Example
7
5 F 8 E 6 D 4 G G 5
7 8 7 6
S 1
� A 2
� B 3
� C
� � � � �
S 1
� B 3
� E
H H F H H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 21
Pruning Partial Paths
• Both greedy and dynamic programming algorithms can take advantage of optimal substructure:
– Let φ(i, j) be the best path between nodes i and j
– If k is a node in φ(i, j):
φ(i, j) = {φ(i, k), φ(k, j)}
– Let ϕ(i, j) be the cost of φ(i, j)
ϕ(i, j) = min(ϕ(i, k) + ϕ(k, j)) k
• Solutions to subproblems need only be computed once
• Sub-optimal partial paths can be discarded while maintaining admissibility of search
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 22
8 6 5
� 6
� � �
Best First Search with Pruning
7
5 F F 8 E E 6 D 4 G G 5
7 7 6
S 1
� A 2
� B 3
� C
� � � � � ��
S 1
� B 3
� E
H F H H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 23
ˆ
ˆ
ˆ
Estimating Future Scores
• Partial path scores, ϕ(1, i), can be augmented with future estimates, ϕ(i), of the remaining cost
ϕφ = ϕ(1, i) + ϕ(i)
• If ϕ(i) is an underestimate of the remaining cost, additional paths can be pruned while maintaining admissibility of search
• A∗search uses
– Best-first search strategy
– Pruning
– Future estimates
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 24
�
Tree Representation (with future estimates)
S 5
� A 5
� �
� B 6
� � �
� C 9
� � E F E D G F G7 9 8 6 6 10 9
� � � � � � � 7 H 9 H 8 F 8 H 6 H 11 H 9
8
H
H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 25
9 8 6
� 6
� �
A∗Search Example
9
7 F F 9 E E 8 D 6 G G 6
8 6
S 5
� A 5
� B 6
� C
� � � � � ��
S 5
� B 6
� E
F H H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 26
N -Best Search
• Used to compute top N paths
– Can be re-scored by more sophisticated techniques
– Typically used at the sentence level
• Can use modified A∗ search to rank paths
– No pruning of partial paths
– Completed paths are removed from queue
– Can use a threshold to prune paths, and still identify admissibility violations
– Can also be used to produce a graph
• Alternative methods can be used to compute N -best outputs (e.g., asynchronous DP)
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 27
7 8 6 6
� 8
� 6
� 7
� 8
�
� � � �
�
N -Best Search Example
S 5
� A 5
� B 6
� C
� � � � �
S 5
� B 6
�
� A 5
� � � E E
H H
9
7 F 9 E E 8 D D 6 G G 6
7 H H 8 F F 8 H H 6
8H 8 H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 28
Dynamic Programming (DP)
• DP algorithms do not employ a greedy strategy
• DP algorithms typically take advantage of optimal substructure and overlapping subproblems by arranging search to solve each subproblem only once
• Can be implemented efficiently:
– Node j retains only best path cost of all ϕ(i, j)
– Previous best node id needed to recover best path
• Can be time-synchronous or asynchronous
• DTW and Viterbi are time-synchronous searches and look like breadth-first with pruning
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 29
Time-Synchronous DP Example
S
� A
� B
�
� E
�
� D
� C
��
� F
�
�
�� G�
��
� H
��S � B
� G
� H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 30
Inadmissible Search Variations
• Can use a beam width to prune current hypotheses
– Beam width can be static or dynamic based on relative score
• Can use an approximation to a lower bound on A∗lookahead for N -best computation
• Search is inadmissible, but may be useful in practice
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 31
Beam Search Example
S
� A
� B
�
� E
�
� D
� C
� C
�� F �
� G
��
� H
��S � B
� G
� H
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 32
References
• Cormen, Leiserson, Rivest, Introduction to Algorithms, 2nd
Edition, MIT Press, 2001.
• Huang, Acero, and Hon, Spoken Language Processing, Prentice-Hall, 2001.
• Jelinek, Statistical Methods for Speech Recognition. MIT Press, 1997.
• Winston, Artificial Intelligence, 3rd Edition, Addison-Wesley, 1993.
6.345 Automatic Speech Recognition Dynamic Time Warping & Search 33