incorporating dynamic time warping (dtw) in the seqrec.m file presented by: clay mccreary, msee

Incorporating Dynamic Time Warping (DTW) in the SeqRec.m

FilePresented by:

Clay McCreary, MSEE

Agenda

• Project Scope

• DTW Basics

• Algorithm Implementation

• Observations

• Conclusion

• Continuing Research

Project Scope

• Modify SeqRec.m to incorporate the DTW algorithm

• Test the implemented DTW algorithm using a small, manually created dictionary containing words of varying lengths

• Determine if the addition of the DTW algorithm provided improvement to the recognition capability of SeqReq.m

DTW Basics

• DTW provides a method of comparing two vectors of different lengths– If the vectors are of the same length, the correlation

function provides adequate comparison

• DTW compares the measured vector to a template vector and provides a “likeness score”– This process is repeated for multiple templates– The template with the lowest “likeness score” is the

template that most closely matches the measured vector

DTW Basics (cont)

• This method is especially useful when the measured vector must be one of the templates– Uttered speech compared to words in a

language dictionary for example

• DTW will compensate for “slurred” words/sounds

DTW Basics (cont.)

• The comparison is accomplished by organizing the measured and the template data into a matrix

• The cells of the matrix are filled with the likelihood of column values matching the row value– Each cell contains the

“Local” distance (maximum likelihood)

d

r

o

w

w w o r r d

DTW Basics (cont.)

• The global distance is the addition of all of the local distances encountered on a “path” from the lower left cell to the upper right cell– The path cannot move down or to the left

• Multiple paths are available• The DTW algorithm searches for the path with

the lowest global distance• This global distance is the “Likeness Score”

DTW Basics (cont.)

• The following DTW path has a global score of 15– This is the lowest

possible path– This path is

accomplished in 6 steps

d

5 5 3 5 6 2

r

3 3 2 3 4 7

o

4 5 1 6 7 6

w

2 3 5 7 8 7

w w o r r d

DTW Basics (cont.)

• The global distance must be normalized to allow comparison of the measured vector to templates of various lengths– This is accomplished by dividing the global

score by the number of steps used on the path

– The previous example would have a normalized likeness score of 2.5

Algorithm Implementation

• Current operation– SeqRec.m compares each sound passed to

the script (a vector of sounds assigned at the input) to each sound in a list of 14 (vowels)

– The likelihood of each of the input sounds matching each of the sounds in the list is determined

– From these likelihoods, a list of recognized sounds is generated

Algorithm Implementation (cont.)

– Each sound is compared to the input sound– If any do not match, the recognized sound

sequence is declared in error


• DTW implementation– The calculations of the likelihoods of each

input sound to be each of the 14 sounds from the list is placed in a matrix, probmatrix

– A matrix of the local distances for the template vs. measured vector matrix, dtwmatrix, is created using values stored in probmatrix

• This matrix is upside down compared to the example when visualized


– costmatrix1 is created to determine the lowest global distance

– A cell is filled by adding the local distance for that cell obtained from dtwmatrix to the lowest value from previous adjacent cells (left, down, or left, down diagonal), which were filled in the same manner

• This results in the value in the cell being the shortest global distance to that cell

– Starting from index (1,1), the top row and left column are filled

• Avoids ‘0’ index which is illegal in MATLAB

– Then, the remaining cells are filled


– This results in the shortest global distance being recorded in the bottom, right cell

– Then, the path is determined by moving to the lowest previous, adjacent cell repeatedly until reaching the (1,1) index counting the number of steps

– The shortest global distance is then divided by the number of steps to normalize


– This process is repeated for all templates of each word length in the dictionary storing the global distances in 2 vectors

• dtwvector contains all the global distances for each length

– The minimum determines the best global distance for that length

• dtwwordlength contains the best global distances of each length

– The minimum determine the best global distance


• The dictionary created to test this algorithm consisted of all permutations of:– 1 2 3– 1 2 3 4– 1 2 3 4 5

• This limited dictionary restricted testing the full capability of the DTW algorithm

Observations

• The previous SeqRec function typically had a high error rate (>50%)

• DTW used these erroneous words for comparison to the templates– Even with the DTW algorithm using these

erroneous words, the error rate was improved to <10%, typically

Observations (cont.)

• Before normalization, erroneous DTW words were usually the same length as the correct word

• After normalization, erroneous DTW words were of various lengths

Conclusions

• By comparing the sequence of input sounds to templates of the whole sequence rather than to each part of the sequence, the DTW algorithm improves recognition by ~5X– DTW allows some error, where as the

previous SeqRec function required that the recognized word be perfect

Continuing Research

• The basic DTW algorithm was implemented in the SeqRec function, but the limited dictionary using words without repeated sounds only allowed the function of the DTW algorithm to be tested, not the full capability

Continuing Research (cont.)

• Thus, the algorithm should be tested using larger dictionaries containing words in which sounds are used multiple times in the same word– The code was written in a general format to

allow the easy incorporation of new dictionaries

incorporating dynamic time warping (dtw) in the seqrec.m file presented by: clay mccreary, msee

Documents