dtw for speech recognition

15
DTW for Speech Recognition J.-S. Roger Jang ( 張張張 ) [email protected] http://www.cs.nthu.edu.tw/~jan g MIR Lab ( 張張 張張張張張張張 ) CS, Tsing Hua Univ. ( 張張張張 張張 張)

Upload: vaughan

Post on 30-Jan-2016

56 views

Category:

Documents


0 download

DESCRIPTION

DTW for Speech Recognition. J.-S. Roger Jang ( 張智星 ) [email protected] http://www.cs.nthu.edu.tw/~jang MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學 資工系 ). Dynamic Time Warping (DTW). Characteristics: Pattern-matching-based approach Require less memory/computation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: DTW for Speech Recognition

DTW for Speech Recognition

J.-S. Roger Jang ( 張智星 )

[email protected]

http://www.cs.nthu.edu.tw/~jang

MIR Lab ( 多媒體資訊檢索實驗室 )

CS, Tsing Hua Univ. ( 清華大學 資工系 )

Page 2: DTW for Speech Recognition

-2-

Dynamic Time Warping (DTW)

Characteristics: Pattern-matching-based approach Require less memory/computation Suitable for speaker-dependent recognition Suitable for small to medium vocabulary Suitable for microprocessor/chip implementation

Applications Speaker identification & verification for surveillance

Voice commands for mobile phones, toys

Page 3: DTW for Speech Recognition

-3-

Dynamic Time Warping: Type 1

i

j

t(i-1)

r(j)

)1,2(

)1,1(

)2,1(

min

)()(),(

jiD

jiD

jiD

jritjiD

),( jiD

t: input MFCC matrix (Each column is a frame’s feature.)r: reference MFCC matrixLocal paths: 27-45-63 degrees

DTW recurrence:r(j-1)

t(i)

Page 4: DTW for Speech Recognition

-4-

Dynamic Time Warping: Type 2

i

j

t(i-1)

r(j)

),1(

)1,1(

)1,(

min

)(),(),(

jiD

jiD

jiD

jritjiD

),( jiD

r(j-1)

t(i)

t: input MFCC matrix (Each row is a frame’s feature.)r: reference MFCC matrixLocal paths: 0-45-90 degrees

DTW recurrence:

Page 5: DTW for Speech Recognition

-5-

Local Path Constraints

Type 1 27-45-63 local paths

Type 2 0-45-90 local paths

jiD ,

jiD ,

),1(

)1,1(

)1,(

min

)()(),(

jiD

jiD

jiD

jritjiD

)1,2(

)1,1(

)2,1(

min

)()(),(

jiD

jiD

jiD

jritjiD

2,1 jiD

1, jiD 1,1 jiD

jiD ,1

1,1 jiD 1,2 jiD

Page 6: DTW for Speech Recognition

-6-

Path Penalty for Type-1 DTW

Path penalty No penalty for 45-degree path Some penalty for paths deviated from 45-degree

)1,2(

)1,1(

)2,1(

min)()(),(

jiD

jiD

jiD

jritjiD

),( jiD

)2,1( jiD

)1,2( jiD

)1,1( jiD

0

Page 7: DTW for Speech Recognition

-7-

DTW Paths of “Match Corners”

We assume the speed of a user’s acoustic input falls within 1/2 and 2 times of that of the intended sentence.

Both corners are fixed. (End point detection is critical.)

Suitable for voice command applications

i

j

Page 8: DTW for Speech Recognition

-8-

DTW Paths of “Match Anywhere”

No fixed anchored positions

Suitable for retrieval of personal spoken documents

i

j

Page 9: DTW for Speech Recognition

-9-

Other Variants

Local constraints

Start/ending area

Page 10: DTW for Speech Recognition

-10-

Implementation Issues

To save memory Use 2-column table for type-1 DTW Use 1-column table for type-2 DTW

To avoid too many if-then statements Pad type-1 DTW with two-layer padding Pad type-2 DTW with one-layer padding

To find a suitable path Minimizing total distance Minimizing average distance

Page 11: DTW for Speech Recognition

-11-

DTW Path of “Match Corners”

Page 12: DTW for Speech Recognition

-12-

DTW Path of “Match Anywhere”

Page 13: DTW for Speech Recognition

-13-

DTW Path of “Match Anywhere”

20 40

20

40

60

80

100

120

140

160

DTW total distance = 304.957

清 華 大 學

我今

天很

高興

來到

清華

大學

進行

演講

20 40

20

40

60

80

100

120

140

160

清 華 大 學

我今

天很

高興

來到

清華

大學

進行

演講

20 40

50

100

150

200400600800

Page 14: DTW for Speech Recognition

-14-

DTW for Spoken Document Retrieval

Applications Voice-based audio/video retrieval

Issues in SDR using DTW Speaker normalization

Vocal track length normalization (VTLN)

Frequency warping

Efficiency

Page 15: DTW for Speech Recognition

-15-

DTW for Speaker-independent Voice Command Recognition

Applications Digit recognition

Technical highlights Extensive recordings Clustering within each command Some indexing methods for DTW Suitable for small-vocabulary applications