non-native users in the let s go!! spoken dialogue system: dealing with linguistic mismatch antoine...
TRANSCRIPT
Non-Native Users in the Let’s Go!! Spoken Dialogue System:
Dealing with Linguistic Mismatch
Antoine Raux & Maxine Eskenazi
Language Technologies Institute
Carnegie Mellon University
Background
Speech-enabled systems use models of the user’s language
Such models are tailored for native speech
Great loss of performance for non-native users who don’t follow typical native patterns
Previous Work on Non-Native Speech Recognition
Assumes knowledge about/data from a specific non-native population
Often based on read speech Focuses on acoustic mismatch:
• Acoustic adaptation
• Multilingual acoustic models
Linguistic Particularities of Non-Native Speakers
Non-native speakers might use different lexical and syntactic constructs
Non-native speakers are in a dynamic process of L2 acquisition
Outline of the Talk
Baseline system and data collection Study of non-native/native mismatch and
effect of additional non-native data Adaptive lexical entrainment
The CMU Let’s Go!! System:Bus Schedule Information for the Pittsburgh Area
ASRSphinx II
ParsingPhoenix
Dialogue ManagementRavenClaw
Speech SynthesisFestival
HUBGalaxy
NLGRosetta
Data Collection
Baseline system accessible since February 2003
Experiments with scenarios Publicized the phone number inside
CMU in Fall 2003
Data Collection Web Page
Data
Directed experiments: 134 calls• 17 non-native speakers (5 from India, 7 from
Japan, 5 others) Spontaneous: 30 calls Total: 1768 utterances Evaluation Data:
• Non-Native: 449 utterances
• Native: 452 utterances
Speech Recognition Baseline
Acoustic Models: • semi-continuous HMMs (codebook size: 256)
• 4000 tied states
• trained on CMU Communicator data
Language Model: • class-based backoff 3-gram
• trained on 3074 utterances from native calls
Speech Recognition Results
Native Non-Native
20.4% 52.0%
Causes of discrepancy:• Acoustic mismatch (accent)• Linguistic mismatch (word choice, syntax)
Word Error Rate:
Language Model Performance
05
1015
2025
3035
40
Per
plex
ity
Native Non-Native
Perplexity0
0.5
1
1.5
2
2.5
3
3.5
% to
kens
Native Non-Native
OOV Rate
0
2
4
6
8
10
12
14
% ut
tera
nces
Native Non-Native
Rate of utterances with OOV
Evaluation on transcripts. Initial model: 3074 native utterances
Adding non-native data:3074 native+1308 non-native utterances
Initial (native) modelMixed model
Language Model Performance
0
0.5
1
1.5
2
2.5
3
3.5
% to
kens
Native Non-Native
OOV Rate
0
2
4
6
8
10
12
14
% ut
tera
nces
Native Non-Native
Rate of utterances with OOV
05
1015
2025
3035
40
Per
plex
ity
Native Non-Native
Perplexity
Natural Language Understanding
Grammar manually written incrementally, as the system was being developed
Initially built with native speakers in mind Phoenix: robust parser (less sensitive to
non-standard expressions)
Grammar Coverage
05
1015202530354045
% wor
ds n
otco
vere
d by
par
se
Native Non-Native
Parse Word Coverage
0
10
20
30
40
50
60
% ut
tera
nces
not
fully
par
sed
Native Non-Native
Parse Utterance Coverage
Initial grammar:• Manually written for
native utterances
Grammar Coverage
05
1015202530354045
% wor
ds n
otco
vere
d by
par
se
Native Non-Native
Parse Word Coverage
0
10
20
30
40
50
60
% ut
tera
nces
not
fully
par
sed
Native Non-Native
Parse Utterance Coverage
Grammar designed to accept some non-native patterns: • “reach” = “arrive”
• “What is the next bus?” = “When is the next bus?”
Relative Improvement due to Additional Data
0
10
20
30
40
50
60
% Im
prov
emen
t
% OOV % utt w/OOV
Perplexity WordCoverage
Utt.Coverage
Native Set Non-Native Set
Effect of Additional Data on Speech Recognition
0
10
20
30
40
50
60
Wor
d Err
or R
ate
(%)
Native Set Non-Native Set
Native ModelMixed Model
Adaptive Lexical Entrainment
“If you can’t adapt the system, adapt the user” System should use the same expressions it
expects from the user But non-native speakers might not master all
target expressions
Use expressions that are close to the non-native speaker’s language
Use prosody to stress incorrect words
Adaptive Lexical Entrainment:Example
I want to go the airport
I want to go the airport?TODid you mean:
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection
EmphasisConfirmation
Prompt
I want to go the airport
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection
EmphasisConfirmation
Prompt
I want to go the airport
I’d like to go to the airport
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection
EmphasisConfirmation
Prompt
I want to go the airport
I’d like to go to the airport
I want to go to the airport
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection
EmphasisConfirmation
Prompt
I want to go the airport
I’d like to go to the airport
I want to go to the airport
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection
EmphasisConfirmation
Prompt
I want to go the airport
I’d like to go to the airport
I want to go to the airport
Adaptive Lexical Entrainment:Algorithm
TargetPrompts
ASR Hypothesis
DP-basedAlignment
PromptSelection
EmphasisConfirmation
Prompt
I want to go the airport
I’d like to go to the airport
I want to go to the airportDid you mean: ?
Adaptive Lexical Entrainment:Study
24 dialogues, 12.4 turns/dialogue, 2.5 entrainment prompts/dialogue
Trigger entrainment whenever best target does not perfectly match ASR• Too simple? 40.7% false positive
64.4% prompts are triggered by misrecognitions
Conclusion
LM and Grammar perform significantly worse on non-native speech
Including non-native data reduces the gap in linguistic metrics but not WER
Future work:• Improve Speech Recognition
• More data for LM
• Acoustic/Lexical adaptation
• Integrate adaptive lexical entrainment in the dialog management framework
Example Dialogue
S: Welcome to the Let’s Go!! bus information system. What can I do for you?
U: Excuse me. I’m in the place which uh where the Forbes Avenue and Murray Avenue are crossing and I want to go to the place mm Forbeswhere the Forbes Avenue and Craig Street are crossing.
S: I heard “SIX”. Is that correct?
U: Which bus uh should I take? Uh sorry uh maybe 61C bus I have… What time the next 61C bus will come?
S: The time is currently 5:37 PM. Where are you leaving from?
…
Example Dialogue
S: Welcome to the Let’s Go!! bus information system. What can I do for you?
U: I want to go to downtown now I’m at Fifth and Bigelow.
S: Sorry, did you say “I want to go to downtown”?
U: Which number bus can I take and when will the next buscome here?
S: ... What can I do for you?
…