detection of burst onset landmarks in speech using rate of change of spectral moments

21
I I T B o m b a y 17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech Using Rate of Change of Spectral Moments A. R. Jayan P. S. Rajath Bhat P. C. Pandey {arjayan, rajathbhat, pcpandey}@ee.iitb.ac.in EE Dept, IIT Bombay 30 th January, 2011

Upload: hadassah-hancock

Post on 03-Jan-2016

48 views

Category:

Documents


6 download

DESCRIPTION

Detection of Burst Onset Landmarks in Speech Using Rate of Change of Spectral Moments A. R. Jayan P. S. Rajath Bhat P. C. Pandey { arjayan , rajathbhat , pcpandey }@ ee.iitb.ac.in EE Dept, IIT Bombay 30 th January, 2011. PRESENTATION OUTLINE. 1. Introduction  Speech landmarks - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

1/21

Detection of Burst Onset Landmarks in Speech Using

Rate of Change of Spectral Moments

A. R. JayanP. S. Rajath Bhat

P. C. Pandey{arjayan, rajathbhat, pcpandey}@ee.iitb.ac.in

EE Dept, IIT Bombay30th January, 2011

Page 2: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

2/21

PRESENTATION OUTLINE

1. Introduction Speech landmarks Landmark detection Clear speech Automated speech intelligibility enhancement

2. Methodology Band energy parameters Spectral moments Rate of change function

3. Evaluation and results VCV utterances Sentences

4. Conclusion

Page 3: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

3/21

1. INTRODUCTION

Speech landmarks

Regions, associated with spectral transitions, containing important information for speech perception

Landmarks and related events [Park, 2008]

Segment type Landmark Description

Vowel Vowel (V) Vowel nucleus

Glide Glide (G) Slow formant transitions

Consonant

Glottis (g)

Sonorant (s)

Burst (b)

Vocal fold vibration

Nasal closure / release

Turbulence noise

Page 4: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

4/21

Landmark detection

Processing Extraction of parameters characterizing the landmark

Computation of the rate of change (ROC) of parameters Locating the landmark using ROC(s)

Applications Intelligibility enhancement

Speech recognition Vocal tract shape estimation

Page 5: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

5/21

Clear speech

Speech produced with clear articulation when talking to a hearing-impaired listener, or in a noisy environment

More intelligible for

▪ Hearing impaired listeners (~17% higher, Picheny et al.,1985)

▪ Listeners in noisy environments (Payton et al., 1994)

▪ Non-native listeners (Bradlow and Bent, 2002)

▪ Children with learning disabilities (Bradlow et al., 2003)

Pronounced acoustic landmarks

Page 6: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

6/21

Conv.

Clear

Example: ‘The book tells a story’ (Recordings from http://www.acoustics.org/press/145th/clr-spch-tab.htm)

Page 7: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

7/21

Automated speech intelligibility enhancement

Automated detection of landmarks

High detection rate with low false detections

Good temporal accuracy (5-10 ms)

Computational efficiency

Modification of speech characteristics

Intensity / duration / spectral modifications around landmarks with minimal perceptual distortions of the acoustic cues in the speech signal

Page 8: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

8/21

Problems in stop consonant perception Transient sound with low intensity Severely affected by noise / hearing impairment

Stop landmarks: Closure Burst onset Onset of voicing

Example: /apa/

Page 9: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

9/21

Some of the earlier landmark detection techniques

Liu (1996): Rate-of-rise measures of parameters from a set of fixed spectral bands (Speech recognition, g, s, b landmarks, 80 TIMIT sentences,

detection rate: 84 % at 20-30 ms, 50 % at 5-10 ms)

Salomon et al. (2002): Temporal parameters related to periodicity, envelope, spectral fine structure (Speech recognition, onsets and offsets of vowels, sonorants, & consonants, 120 TIMIT sentences, detection rate: 90 % at 20 ms)

Sainath and Hazan (2006): Sinusoidal model parameters (Speech

segmentation, 453 TIMIT sentences, word error rates: 20 % )

Niyogi & Sondhi (2002): Stop landmark detection using total energy, energy above 3 kHz & Wiener entropy (Speech recognition, stop consonants,

320 TIMIT sentences, detection rate: 90 % at 20 ms)

Jayan & Pandey (2009): Stop landmark detection using GMM parameters (Speech enhancement, 50 TIMIT sentences, detection rate: 73 % at 5 ms)

Page 10: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

10/21

Improving landmark detection

Parameters ▪ Capturing spectral transitions▪ Adaptation to speech variability

Rate of change measure ▪ Range of parameter variations▪ Correlation among parameters

Adaptive time steps▪ Small time step for abrupt variations▪ Large time step for slow variations

Objective of the present investigation

Detection of burst landmarks for automated intelligibility enhancement

Page 11: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

11/21

2. METHODOLOGYBand energy parameters

Log of spectral peaks in three bands ▪ b1: 1.2-2.0 kHz ▪ b2: 2.0-3.5 kHz ▪ b3: 3.5-5.0 kHz

Mag. spectrum (10 kHz sampling) computed using 512-point DFT, 6 ms Hanning window, 1 frame per ms, and smoothed by 20-point

moving average.

Smoothed mag. spectrum X(n, k) used for calculating log of spectral peak in band i

10 min max( ) 20log max ( , ),bi i iE n X n k k k k n = time index, k = frequency index

Page 12: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

12/21

Example: Band energy parameters for /aga/

Time (ms)

(a) Speech waveform

(b) Band energy's

Page 13: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

13/21

Spectral moments

Normalized spectrum/ 2

1

( , ) ( , ) ( , )N

k

p n k X n k X n k

Centroid : frequency of energy concentration / 2

1

( ) ( , )N

c kk

F n p n k f

n = time index, k = frequency index, N = DFT size

Variance : spread of energy around the centroid 1/ 2/ 2

2

1

( ) ( ( )) ( , )N

k ck

F n f F n p n k

Skewness : measure of spectral symmetry 1/3/ 2

3

1

( ) ( ( )) ( , )N

s k ck

F n f F n p n k

Kurtosis : measure of spectral peakiness1/ 4/ 2

4

1

( ) ( ( )) ( , )N

k k ck

F n f F n p n k

Page 14: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

14/21

Example: Band energy parameters & spectral moments for /aga/

Time (ms)

(a) Waveform

(b)

(c)

(d)

Page 15: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

15/21

Measures of rate of change

● First difference based rate of change (ROC)

K = time step

ROC( ) ( ) ( )b bn E n E n K

● Mahalanobis distance based rate of change (ROC-MD)

A single measure indicative of the overall variation, taking care of parameter range and correlation effects

0.51mdROC ( ) ( ( ) ( )) ( ( ) ( ))Tn n n K n n K y y y y

y(n) = parameter set at time nK = time step = covariance matrix, pre-calculated using the parameter set from segments with energy above a threshold

Page 16: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

16/21

Detection of voicing offset and onset ▪ Band energy in 0-400 Hz▪ ROC(n) computed with time step 50 ms▪ Voicing offset [g-] : ROC(n) -12 dB ▪ Voicing onset [g+] : ROC(n) +12 dB

Burst onset landmark detectionMost prominent peak in the ROC-MD(n) between g- and g+

Example /aga/

(b) ROC-MD

(c) ROC

Time (ms)

(a) Waveform

Page 17: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

17/21

3. EVALUTATION & RESULTS

Effects of rate of change functions & parameters on burst detection

ROC and parameters

1) ROC(BE): Sum of normalized ROCs of [Eb1, Eb2, Eb3]

2) ROC-MD(BE): ROC-MD of [Eb1, Eb2, Eb3]

3) ROC-MD(SM): ROC-MD of [Fc , F , Fk , Fs]

4) ROC-MD(BE,SM): ROC-MD of [Eb1, Eb2, Eb3, Fc , F , Fk , Fs]

Material: VCV utterances, TIMIT sentences

Time steps: 3, 6 ms

Temporal accuracies: 3, 5, 10, 15, 20 ms

Page 18: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

18/21

VCV utterances▪ 6 stop consonants (b, d, g, p, t, k)▪ 3 vowel contexts (a, i, u)▪ 10 speakers (5 M, 5 F)▪ 180 tokens

20

40

60

80

100

3 5 10 15 20 3 5 10 15 20 3 5 10 15 20 3 5 10 15 20

Temporal accuracy (ms)

Det

ectio

n ra

te (%

)

3 ms 6 ms

8187 86

97

76

90 9099

Time step

ROC(BE) ROC-MD(BE) ROC-MD(SM) ROC-MD(BE, SM)

Page 19: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

19/21

TIMIT Sentences▪ 5 speakers (2 M, 3 F) ▪ 10 sentences from each speaker ▪ 238 tokens

30405060708090

100

3 5 10 15 20 3 5 10 15 20 3 5 10 15 20 3 5 10 15 20

Temporal accuracy (ms)

Det

ectio

n ra

te (%

)

3 ms

49

74

58

86

45

71

58

88

Time step

ROC(BE) ROC-MD(BE) ROC-MD(SM) ROC-MD(BE, SM)

Error typeInsertion rates (%)

ROC(BE) ROC-MD(BE) ROC-MD(SM) ROC-MD(BE,SM)

Vowel / sem. vowel 13 11 13 11

Frication 5 11 10 9

Glottal stops / clicks 4 3 3 4

Page 20: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

20/21

4. CONCLUSION

Increase in time steps reduced detection accuracy.

Mahalanobis distance based ROC was more effective than first-difference based rate of change.

Spectral moments were useful as additional parameters in improving burst-onset detection.

Page 21: Detection of  Burst Onset Landmarks in Speech Using  Rate of Change of Spectral Moments

I IT B

om

bay

17 th National Conference on Communications , 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3

21/21

Thank you