diagnostic assessment of childhood apraxia of speech using techniques from automatic speech...

40
Diagnostic Assessment of Diagnostic Assessment of Childhood Apraxia of Speech Using Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition Techniques from Automatic Speech Recognition (ASR) (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2 Jordan R. Green 3 1 Center for Spoken Language Understanding, Oregon Health & Science University 2 Waisman Center, University of Wisconsin - Madison 3 Department of Special Education & Communication Disorders, University of Nebraska - Lincoln This research is supported by NIDCD grants DC000496 and DC006722

Upload: james-ramsey

Post on 04-Jan-2016

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

Diagnostic Assessment of Diagnostic Assessment of Childhood Apraxia of Speech Using Childhood Apraxia of Speech Using Techniques from Automatic Speech Techniques from Automatic Speech

Recognition (ASR)Recognition (ASR)John-Paul Hosom1

Lawrence D. Shriberg2

Jordan R. Green3

1Center for Spoken Language Understanding, Oregon Health & Science University

2Waisman Center, University of Wisconsin - Madison

3Department of Special Education & Communication Disorders,University of Nebraska - Lincoln

This research is supported by NIDCD grants DC000496 and DC006722

Page 2: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

2

Outline of TalkOutline of Talk

• Complex Disease Model for Childhood Speech-Sound Disorders of Unknown Origin

• Diagnostic Markers for suspected Apraxia of Speech (sAOS)

• Overview of Automatic Speech Recognition (ASR)

• Applying ASR to the Lexical Stress Ratio (LSR)

• Applying ASR to Coefficient of Variation Ratio (CVR)

• Summary, Current and Future Work

Page 3: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

3

Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of

Unknown OriginUnknown OriginRisk and Protective Factors

EnvironmentalGenetic

Cognitive-Linguistic

Auditory-Perceptual

Speech Motor

Control

Psycho-social

Phonological Attunement

Speech Delay – Genetic

(SD-GEN)

Speech Delay – Otitis Media

with Effusion (SD-OME)

Speech Delay Speech Motor Involvement

(SD-SMI)

Speech Delay – Developmental Psychosocial Involvement

(SD-DPI)

Speech Errors (SE)

SD-GEN SD-OME SD-AOS SD-DPI SE-/s/ SE-/r/

I. Etiological Processes

II. Explanatory Processes

III. Nosological Entity

IV. Trait Markers (phenotypes, endophenotypes)

> Omissions< Distortions< Backing

> Omissions< Distortions< Backing

- - - -- - - - - - - -

> M1 values- - - - - - - -

- - - -- - - - - - - -

< F3–F2- - - - - - - -

- - - -- - - -- - - -

Speech markers> I-S Gap> Backing

- - - - - - - - - - - -

> Severity- - - - - - - -

SD-DYS

8 speech markersLex. Stress Ratio>Coeff. Var. Ratio

8 speech markersLex. Stress Ratio>Coeff. Var. Ratio

- - - -- - - - - - - -

- - - - - - - - - - - -

V. Diagnostic Markers

*Shriberg, Austin, et al. (1997)

Page 4: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

4

Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of

Unknown OriginUnknown OriginRisk and Protective Factors

EnvironmentalGenetic

Cognitive-Linguistic

Auditory-Perceptual

Speech Motor

Control

Psycho-social

Phonological Attunement

Speech Delay – Genetic

(SD-GEN)

Speech Delay – Otitis Media

with Effusion (SD-OME)

Speech Delay Speech Motor Involvement

(SD-SMI)

Speech Delay – Developmental Psychosocial Involvement

(SD-DPI)

Speech Errors (SE)

SD-GEN SD-OME SD-AOS SD-DPI SE-/s/ SE-/r/

I. Etiological Processes

II. Explanatory Processes

III. Nosological Entity

IV. Trait Markers (phenotypes, endophenotypes)

SDCS*- - - -- - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

- - - - - - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

SD-DYS

SDCS- - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

V. Diagnostic Markers

*Shriberg, Austin, et al. (1997)

Page 5: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

5

Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of

Unknown OriginUnknown OriginRisk and Protective Factors

EnvironmentalGenetic

Cognitive-Linguistic

Auditory-Perceptual

Speech Motor

Control

Psycho-social

Phonological Attunement

Speech Delay – Genetic

(SD-GEN)

Speech Delay – Otitis Media

with Effusion (SD-OME)

Speech Delay Speech Motor Involvement

(SD-SMI)

Speech Delay – Developmental Psychosocial Involvement

(SD-DPI)

Speech Errors (SE)

SD-GEN SD-AOS SD-DPI SE-/s/ SE-/r/

I. Etiological Processes

II. Explanatory Processes

III. Nosological Entity

IV. Trait Markers (phenotypes, endophenotypes)

SD-DYS

V. Diagnostic Markers

*Shriberg, Austin, et al. (1997)

SD-OME

SDCS*- - - -- - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

- - - - - - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

SDCS- - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

Page 6: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

6

Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of

Unknown OriginUnknown OriginRisk and Protective Factors

EnvironmentalGenetic

Cognitive-Linguistic

Auditory-Perceptual

Speech Motor

Control

Psycho-social

Phonological Attunement

Speech Delay – Genetic

(SD-GEN)

Speech Delay – Otitis Media

with Effusion (SD-OME)

Speech Delay Speech Motor Involvement

(SD-SMI)

Speech Delay – Developmental Psychosocial Involvement

(SD-DPI)

Speech Errors (SE)

SD-GEN SD-OME SD-AOS SD-DPI SE-/s/ SE-/r/

I. Etiological Processes

II. Explanatory Processes

III. Nosological Entity

IV. Trait Markers (phenotypes, endophenotypes)

SD-DYS

V. Diagnostic Markers

*Shriberg, Austin, et al. (1997)

SDCS*- - - -- - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

- - - - - - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

SDCS- - - - - - - -

SDCS- - - - - - - -

- - - -- - - - - - - -

SDCS- - - - - - - -

Page 7: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

7

Complex Disease Model for ChildhoodComplex Disease Model for ChildhoodSpeech Sound Disorders (SSD) of Speech Sound Disorders (SSD) of

Unknown OriginUnknown OriginRisk and Protective Factors

EnvironmentalGenetic

Cognitive-Linguistic

Auditory-Perceptual

Speech Motor

Control

Psycho-social

Phonological Attunement

Speech Delay – Genetic

(SD-GEN)

Speech Delay – Otitis Media

with Effusion (SD-OME)

Speech Delay Speech Motor Involvement

(SD-SMI)

Speech Delay – Developmental Psychosocial Involvement

(SD-DPI)

Speech Errors (SE)

SD-GEN SD-OME SD-AOS SD-DPI SE-/s/ SE-/r/

I. Etiological Processes

II. Explanatory Processes

III. Nosological Entity

IV. Trait Markers (phenotypes, endophenotypes)

SD-DYS

V. Diagnostic Markers

*Shriberg, Austin, et al. (1997)

> Omissions< Distortions< Backing

> Omissions< Distortions< Backing

- - - -- - - - - - - -

> M1 values- - - - - - - -

- - - -- - - - - - - -

< F3–F2- - - - - - - -

- - - -- - - -- - - -

Speech markers> I-S Gap> Backing

- - - - - - - -- - - -

> Severity- - - - - - - -

8 speech markersLex. Stress Ratio>Coeff. Var. Ratio

8 speech markersLex. Stress Ratio>Coeff. Var. Ratio

- - - -- - - - - - - -

- - - - - - - - - - - -

Page 8: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

8

Diagnostic Markers for Diagnostic Markers for suspected Apraxia of Speech (sAOS)suspected Apraxia of Speech (sAOS)

• Childhood Apraxia of Speech is controversial disorderdue to lack of consensus on features that define it and underlying causes. (Guyette & Diedrich, 1981; Shriberg et al., 1997)

• “suspected Apraxia of Speech” (sAOS) proposed as interim term (Shriberg et al., 1997)

• Two proposed markers for sAOS: Lexical Stress Ratio (LSR) (Shriberg et al., 2003a)

Coefficient of Variation Ratio (CVR) (Shriberg et al., 2003b)

• This work: Pilot study for complete automation of these markers, to address inherent human variability. Aim was to replicate results of prior work.

• Techniques from automatic speech recognition (ASR)

Page 9: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

9

Outline of TalkOutline of Talk

• Complex Disease Model for Childhood Speech-Sound Disorders of Unknown Origin

• Diagnostic Markers for suspected Apraxia of Speech (sAOS)

• Overview of Automatic Speech Recognition (ASR)

• Applying ASR to the Lexical Stress Ratio (LSR)

• Applying ASR to Coefficient of Variation Ratio (CVR)

• Summary, Current and Future Work

Page 10: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

10

Overview of Overview of Automatic Speech RecognitionAutomatic Speech Recognition

• Automatic Speech Recognition (ASR) is mapping fromrecorded speech signal to words. Words are representedas sequence of phonemes.

Page 11: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

11

Overview of Overview of Automatic Speech RecognitionAutomatic Speech Recognition

• Automatic Speech Recognition (ASR) is mapping fromrecorded speech signal to words. Words are representedas sequence of phonemes.

• Don’t know where phonemes begin or end, so (1) break signalinto short (10-msec) units, (2) compute the probability of eachphoneme at each unit, (3) find most likely phoneme sequence.

p(E)=.4

p(s)=.0p(^)=.2

p(i)=.1

Page 12: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

12

Overview of Overview of Automatic Speech RecognitionAutomatic Speech Recognition

• Automatic Speech Recognition (ASR) is mapping fromrecorded speech signal to words. Words are representedas sequence of phonemes.

• Don’t know where phonemes begin or end, so (1) break signalinto short (10-msec) units, (2) compute the probability of eachphoneme at each unit, (3) find most likely phoneme sequence.

f 1 n 2 tc t 8 kc k s

Page 13: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

13

Overview of Overview of Automatic Speech RecognitionAutomatic Speech Recognition

from Encyclopedia of Information Systems, H. Bidgoli (editor), vol. 4, pp. 155-169, 2003.

Page 14: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

14

Overview of Overview of Automatic Speech RecognitionAutomatic Speech Recognition

p(x)

x

• Gaussian Mixture Model (GMM) is a way of estimatingprobabilities given a feature value

= one Gaussian (Normal) distribution with mean µ and standard deviation .

µ

x

Page 15: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

15

Overview of Overview of Automatic Speech RecognitionAutomatic Speech Recognition

from Encyclopedia of Information Systems, H. Bidgoli (editor), vol. 4, pp. 155-169, 2003.

Page 16: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

16

Overview of Overview of Automatic Speech RecognitionAutomatic Speech Recognition

from Encyclopedia of Information Systems, H. Bidgoli (editor), vol. 4, pp. 155-169, 2003.

Page 17: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

17

Overview of Overview of Automatic Speech RecognitionAutomatic Speech Recognition

• Better estimation of phoneme probabilities at each time tresults in more accurate ASR performance (correct words).

• Estimation of probabilities depends on training a phonemeclassifier on large amounts of speech data.

• If the type of data used in training is different from the typeof data seen in testing, probabilities will be low and accuracywill be poor.

• Important to match training and testing conditions as closelyas possible.

• ASR yields two results:(1) most likely word or word sequence(2) locations of each phoneme in recognized word

Page 18: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

18

Outline of TalkOutline of Talk

• Complex Disease Model for Childhood Speech-Sound Disorders of Unknown Origin

• Diagnostic Markers for suspected Apraxia of Speech (sAOS)

• Overview of Automatic Speech Recognition (ASR)

• Applying ASR to the Lexical Stress Ratio (LSR) The Lexical Stress Ratio Measuring Fundamental Frequency Computing Probability of Lexical Stress Results

• Applying ASR to Coefficient of Variation Ratio (CVR)

• Summary, Current and Future Work

Page 19: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

19

Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:The Lexical Stress RatioThe Lexical Stress Ratio

• LSR (Shriberg et al., 2003a) measures “inappropriate lexical stress” observed in children with sAOS

• Inappropriate lexical stress:excessive stress on a syllable, orlack of stress on a syllable that is normally stressed

• Three factors used to measure lexical stress:F0, amplitude, and duration of the first and second vowels in trochaic (stress on the first syllable) words

• Due to problems reliably extracting duration, initial focusof automation on only ratio of F0 in first and second vowel

• Either high or low F0 ratios may be associated with sAOS.

“dishes,” reduced stress

“chicken,” excessive stress

“puppy,” excessive stress

Page 20: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

20

• Data from Shriberg et al.’s 2003a study (LSR corpus):

24 children with speech delay (control data)

11 children with sAOS

Recordings of elicited samples of 8 trochaic words

Average age: 6 yrs, 4 mo. for children with speech delay, 7 yrs, 1 mo. for children with sAOS.

Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Speech DataSpeech Data

Page 21: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

21

Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Measuring FMeasuring F00

• Fundamental frequency (F0) measured by locating peak of histogram of “strong” outputs from 32 narrow-band filters

9x4=222 Hz

Per.:F0: 889 444 296 222 111

1500 Hz

0 Hz

500 Hz

1000 Hz

889 Hz

889 Hz

889 Hz

800 Hz

727 Hz

667 Hz

444 Hz

400 Hz

444 Hz

242 Hz

228 Hz

235 Hz

222 Hz

667 Hz

889 Hz = periodicity of 9 samples

9x1=889 Hz(3 counts)

9x2=444 Hz

9x3=296 Hz

hist

ogra

m c

ount

216 Hz

9 12 15 18 21 24 27 30 33 36 39 69 72 75

9x8=111Hz

• Comparison with Kay Elemetrics’ CSL algorithm on LSR data:CSL: 30 cases of F0 error > 30 Hznew: 8 cases of F0 error > 30 Hz

Page 22: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

22

Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Computing Probability of Lexical StressComputing Probability of Lexical Stress

• Histogram of normalized counts (probabilities) of F0 ratiosof SD subjects and sAOS subjects

Ratio of F0s in first and second vowel

prob

abili

ty g

iven

F0 r

atio

= sAOS= SD

Page 23: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

23

Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Computing Probability of Lexical StressComputing Probability of Lexical Stress

• Probability Distribution Functions (PDFs) of F0 ratiosof SD subjects and sAOS subjects using Gamma distribution

p(SD|F0(w))

p(sAOS|F0(w))

Page 24: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

24

Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Computing Probability of Lexical StressComputing Probability of Lexical Stress

• Probability of Lexical Stress Characteristic of sAOS:

• Use one formulation of Bayes’ Rule (only two choices):

)1)((

)()(

))(|(

))(|()(

8

1 0

0

sAOSodds

sAOSoddssAOSp

wFSDp

wFsAOSpsAOSodds

w

where w is an individual word spoken by a subject

• Decision criterion: sAOS if p(sAOS) > 0.5

Page 25: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

25

Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:Computing Probability of Lexical StressComputing Probability of Lexical Stress

• Probability of Lexical Stress:

• Example of 4 observations, equal probabilities:

• Example of 3 observations, different probabilities:

5.02

1

)11(

1)(

111115.0

5.0

5.0

5.0

5.0

5.0

5.0

5.0)(

sAOSp

sAOSodds

84.027.6

27.5

)127.5(

27.5)(

27.566.00.366.26.0

4.0

2.0

6.0

3.0

8.0)(

sAOSp

sAOSodds

Page 26: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

26

Applying ASR to the Lexical Stress Ratio:Applying ASR to the Lexical Stress Ratio:ResultsResults

• Evaluation of method on data used to build models:• Sensitivity/Specificity: 64% / 88%• PPV/NPV: 70% / 84%

• Evaluation of method on new data:• essentially chance performance

• Conclusions:• Large difference between characteristics of training and

testing data• Need more data to develop better models

Page 27: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

27

Outline of TalkOutline of Talk

• Complex Disease Model…

• Diagnostic Markers for suspected Apraxia of Speech (sAOS)

• Overview of Automatic Speech Recognition (ASR)

• Applying ASR to the Lexical Stress Ratio (LSR)

• Applying ASR to Coefficient of Variation Ratio (CVR) The Coefficient of Variation Ratio Identifying Speech/Pause Regions Using ASR Computing the CVR Results

• Summary, Current and Future Work

Page 28: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

28

Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:

The Coefficient of Variation RatioThe Coefficient of Variation Ratio• CVR (Shriberg et al., 2003b) measures reduction in normal

temporal variation of speech, as observed in children with sAOS.

• Measurement of CVR depends on duration of speech events and duration of pause events

• Because of reduced variability of speech-event durations in children with sAOS, these children have higher CVR values relative to control group

s

s

p

p

speech

pause

CV

CVCVR

p = standard deviation of pause eventsp = mean duration of pause eventss = standard deviation of speech eventss = mean duration of speech events

Page 29: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

29

Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:

The Coefficient of Variation RatioThe Coefficient of Variation Ratio• In Shriberg et al. 2003b, speech/pause events detected by:

(1) displaying speech amplitude envelope using Matlab software(2) human identification of pause event with largest amplitude(3) speech/pause classification using threshold from Step (2)(4) removing speech/pause regions with duration < 100 msec

• Preliminary results show good agreement between this Matlab-based algorithm and manual measurements from spectrograms (Green et al., 2004)

Page 30: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

30

Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:

Identifying Speech/Pause Regions Using Identifying Speech/Pause Regions Using ASRASR• Can be difficult to identify speech/pause from only energy

or amplitude envelope, so investigated speech/pausedetection using ASR

• ASR system trained using 300 utterances from 3 children with speech delay of unknown origin

• All training data phonetically labeled by hand, time-aligned at the phoneme level

• ASR system trained to classify 8 broad-phonetic classes related to speech (e.g. “nasal”), instead of specific phonemes

• State sequence used by ASR system imposed constraints onsequences of phonemic classes to be consistent withEnglish syllable structure

Page 31: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

31

Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:

Identifying Speech/Pause Regions Using Identifying Speech/Pause Regions Using ASRASR• ASR system recognized the following categories of speech:

• State sequence (grammar) allowed sequences such as.pau clo plo vow nas .pau (e.g. for the isolated-word utterance “can”)

but not.pau nas wfrc vow .pau(violates sonority principle)

.noise non-speech noise (e.g. door slam, breath)

.pau silence or pauseclo stop closurenas nasalplo stop burstsfrc strong fricativevow vowel, liquid, or glidewfrc weak fricative

Page 32: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

32

Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:

Computing the CVRComputing the CVR• ASR results (broad phonetic classes with English syllable

structure) mapped to “speech” and “pause” events

• CVR computed as in Shriberg et al. (2003b), except thatregions less than 50 msec merged with neighboring regions.

phn class:

speech/pau:

wave:

spectrogram:

Page 33: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

33

Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:

Speech DataSpeech Data• Data from Shriberg et al.’s 2003b study (CVR corpus):

30 children with normal speech (NS) (control data) 30 children with speech delay (SD) (control data) 15 children with sAOS Recordings of conversational speech

Page 34: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

34

Applying ASR to the Coefficient of Applying ASR to the Coefficient of Variation Ratio:Variation Ratio:

ResultsResults

• The CV-Speech values had ES values of 0.95 and 1.04 for NS/sAOS and SD/sAOS, respectively, although there is the possibility of a confounding age effect.

• Conclusion: ASR techniques appear to be applicable to the computation of the CVR; support for the percept of isochrony in the sAOS subjects.

• Shriberg et al.’s 2003b study: mean CVR of 1.05 for NS, 1.04 for SD, and 1.36 for sAOS effect size of 0.72 for NS/sAOS, ES of 0.71 for SD/sAOS.

• ASR-based method: mean CVR of 1.24 for NS, 1.13 for SD, and 1.42 for sAOS effect size of 0.68 for NS/sAOS, ES of 1.07 for SD/sAOS.

Page 35: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

35

Outline of TalkOutline of Talk

• Complex Disease Model…

• Diagnostic Markers for suspected Apraxia of Speech (sAOS)

• Overview of Automatic Speech Recognition (ASR)

• Applying ASR to the Lexical Stress Ratio (LSR)

• Applying ASR to Coefficient of Variation Ratio (CVR)

• Summary, Current and Future Work

Page 36: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

36

SummarySummary

• More data necessary in order to apply statistical models incomputation for LSR. Data collection currently under wayin separate projects.

• Agreement between published results and current results indicates potential for ASR-based CVR

• Improvements necessary for automation: Train ASR system on larger amount of speech data

Improve F0 estimation for children’s speech.

Page 37: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

37

Current and Future WorkCurrent and Future Work

• Current work focusing on:

(a) understanding differences between published CVR values and ASR-based CVR values,

(b) extension of CVR to syllable-based measure instead of speech-event-based measure, and

(c) extension of LSR to conversational speech.

Page 38: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

38

Current and Future WorkCurrent and Future Work

• Future work will focus on:

(a) applying ASR to measurement of other prosodic factors, such as inter-stress intervals, linguistic rhythm, speaking-rate variation, and glottal-source variation

(b) multiple measures of sAOS may be combined for improved sensitivity and specificity

(c) evaluating specific factors that influence diagnosis

Page 39: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

39

ReferencesReferences

• Green, J., Beukelman, D., Ball, L., Ullman, C., and Maassen K. (2004). “Development and Evaluation of a Computer-based System to Measure and Analyze Pause and Speech Events,” Conference on Motor Speech: Motor Speech Disorders, Speech Motor Control, Albuquerque, NM.

• Guyette, T. W. and Diedrich, W. M. (1981). "A Critical Review of Developmental Apraxia of Speech," in Speech and Language: Advances in Basic Research and Practice, 5, pp. 1-45.

• Hawley, M. (2003). “Speech Training And Recognition for Dysarthric Users of Assistive Technology (STARDUST) ”, Wales International Conference on Electronic Assistive Technology, Cardiff, Wales, July 2003.

• Hosom, J. P. (2000). Automatic Time Alignment of Phonemes Using Acoustic-Phonetic Information. Ph.D. thesis, Oregon Graduate Institute of Science and Technology, Beaverton, Oregon.

• Kasi, K. and Zahorian, S. A. (2002). “Yet Another Algorithm for Pitch Tracking,” in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2002, Orlando, FL, 1, pp. 361-364.

Page 40: Diagnostic Assessment of Childhood Apraxia of Speech Using Techniques from Automatic Speech Recognition (ASR) John-Paul Hosom 1 Lawrence D. Shriberg 2

40

ReferencesReferences

• Marquardt, T. P., Sussman, H. M., Snow, T., and Jacks, A. (2002). "The Intelligibility of the syllable in developmental apraxia of speech," in Journal of Communication Disorders, 35, pp. 31-49.

• Shriberg, L. D., Austin, D., Lewis, B. A., McSweeny, J. L., and Wilson, D. L. (1997). "The Speech Disorders Classification System (SDCS): Extensions and Lifespan Reference Data," in Journal of Speech, Language, and Hearing Research, 40, pp. 723-740.

• Shriberg, D. L., Campbell, T. F., Karlsson, H. B., Brown, R. L., McSweeny, J. L., & Nadler, C. J. (2003a). A Diagnostic Marker for Childhood Apraxia of Speech: The Lexical Stress Ratio,” in Special Issue: Diagnostic Markers for Child Speech-Sound Disorders, Clinical Linguistics & Phonetics. 17.7, pp. 549-574.

• Shriberg, D. L., Green, J. R., Campbell, T. F., McSweeny, J. L., & Scheer, A. (2003b). “A Diagnostic Marker for Childhood Apraxia of Speech: The Coefficient of Variation Ratio,” in Special Issue: Diagnostic Markers for Child Speech-Sound Disorders, Clinical Linguistics & Phonetics, 17.7, pp. 575-595.