sorry, i didn’t catch that …
DESCRIPTION
Sorry, I didn’t catch that …. Non-understandings and recovery in spoken dialog systems Part II: Sources & impact of non-understandings, Performance of various recovery strategies Dan Bohus Sphinx Lunch Talk Carnegie Mellon University, March 2005. S: What city are you leaving from? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/1.jpg)
Sorry, I didn’t catch that …Non-understandings and recovery in spoken dialog systemsPart II: Sources & impact of non-understandings, Performance of various recovery strategies
Dan BohusSphinx Lunch TalkCarnegie Mellon University, March 2005
![Page 2: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/2.jpg)
2
Non-understandings
S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]
NON-understanding
System cannot extract any meaningful information from the user’s turn
How can we prevent non-understandings? How can we recover from them?
Detection Set of recovery strategies Policy for choosing between them
review : sources : impact : strategy performance
![Page 3: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/3.jpg)
3
Issues under investigation
Data Collection Detection / Diagnosis
What are the main causes (sources) of non-understandings? What is their impact on global performance? Can we diagnose non-understandings at run-time? Can we optimize the rejection process in a more principled way?
Set of recovery strategies What is the relative performance of different recovery strategies? Can we refine current strategies and find new ones
Policy for choosing between them Can we improve performance by making smarter choices? If so, can we learn how to make these smarter choices?
review : sources : impact : strategy performance
![Page 4: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/4.jpg)
4
Data Collection: Experimental Design
Subjects interact over the telephone with RoomLine Performed 10 of scenario-based tasks
Between-subjects experiment, 2 groups: Control: system uses a random (uniform) policy for engaging
the non-understanding recovery strategies Wizard: policy is determined at runtime by a human (wizard)
46 subjects, balanced gender x native 449 sessions; 8278 user turns Sessions transcribed & annotated
review : sources : impact : strategy performance
![Page 5: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/5.jpg)
5
REPROMPT
NOTIFY
MOVE-ON
HELP
REPEAT
Non-understanding StrategiesS: For when do you need the room?U: [non-understanding]
1. MOVE-ON (MOVE) Sorry, I didn’t catch that. For which day you need the room?
2. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am …3. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am …
4. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am …5. ASK REPEAT (AREP) Could you please repeat that?6. ASK REPHRASE (ARPH) Could you please try to rephrase that?7. NOTIFY (NTFY) Sorry, I didn’t catch that ...8. YIELD TURN (YLD) …9. REPROMPT (RP) For when do you need the conference room?
10. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation …
review : sources : impact : strategy performance
![Page 6: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/6.jpg)
6
Issues under Investigation
Data Collection Detection / Diagnosis
What are the main causes (sources) of non-understandings? What is their impact on global performance? Can we diagnose non-understandings at run-time? Can we optimize the rejection process in a more principled way?
Set of recovery strategies What is the relative performance of different recovery strategies? Can we refine current strategies and find new ones
Policy for choosing between them Can we improve performance by making smarter choices? If so, can we learn how to make these smarter choices?
review : sources : impact : strategy performance
![Page 7: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/7.jpg)
7
Communication [Clark, Horvitz, Paek]
ConversationLevel
IntentionLevel
SignalLevel
ChannelLevel
Channel
Recognition
Parsing
Interpretation
End-pointing
Goal
Semantics
Text
Audio
User System
review : sources : impact : strategy performance
![Page 8: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/8.jpg)
8
Modeling and Breakdowns
ConversationLevel
IntentionLevel
SignalLevel
ChannelLevel
Channel
Recognition
Parsing
Interpretation
End-pointing
Goal
Semantics
Text
Audio
User System
review : sources : impact : strategy performance
![Page 9: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/9.jpg)
9
“Location” & “types” of errors
Channel
Recognition
Parsing
Interpretation
End-pointing
Goal
Semantics
Text
Audio
User SystemOut-of-domain
Out-of-applicationFalse Rejections
Out-of-grammarOut-of-relevance
ASR errorsaccents
noises
review : sources : impact : strategy performance
End-pointer errors
![Page 10: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/10.jpg)
10
% of non-understandings
Out-of-grammarOut-of-relevance
ASR errorsaccents
noises
12.89%
18.59%
8.02%
3.21%
56.05%
3.91%
Out-of-domainOut-of-applicationFalse Rejections
0.14%
review : sources : impact : strategy performance
End-pointer errors
![Page 11: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/11.jpg)
11
Out-of-application (13% of Nonu)
2 main classes, about equally split Request for inexistent task functionality
“A room Monday or Tuesday” “do you have anything anytime Thursday afternoon?”
Request for inexistent “meta” functionality Corrections:
“Can I change the date” “You got the time wrong” “Wrong day”
Q: How to better convey system boundaries? Q: Extend system language for corrections?
review : sources : impact : strategy performance
![Page 12: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/12.jpg)
12
Out-of-grammar (8% of Nonu)
Imperfect grammar coverage “Doesn’t matter” “It doesn’t matter” “Internet connection” “Network connection” “Vaguely” “So so” / “Generally” / etc
Q: Bring users in grammar? Carefully craft & use the “You Can Say” prompts
Q: Extend the grammar? Online & in an unsupervised fashion?
review : sources : impact : strategy performance
![Page 13: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/13.jpg)
13
Grammaticality - Summary
It’s important: 25% of non-understandings Stems (about equally) from:
Requests for inexistent task functionality Requests for inexistent meta/corrections functionality Lack of grammar coverage
Solutions Offline: enlarge grammar, include correction language Online
Carefully design “You Can Say” All You Can Say [Collagen / USI] Unsupervised learning of new grammar expressions
review : sources : impact : strategy performance
![Page 14: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/14.jpg)
14
All You Can Say
How much of the system functionality is actually used? [under work] Certain “task” and “meta” aspects of functionality are very
rarely or never used
User System
![Page 15: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/15.jpg)
15
% of non-understandings
Out-of-grammarOut-of-relevance
ASR errorsaccents
noises
12.89%
18.59%
8.02%
3.21%
56.05%
3.91%
Out-of-domainOut-of-applicationFalse Rejections
0.14%
review : sources : impact : strategy performance
End-pointer errors
![Page 16: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/16.jpg)
16
Issues under Investigation
Data Collection Detection / Diagnosis
What are the main causes (sources) of non-understandings? What is their impact on global performance? Can we diagnose non-understandings at run-time? Can we optimize the rejection process in a more principled way?
Set of recovery strategies What is the relative performance of different recovery strategies? Can we refine current strategies and find new ones
Policy for choosing between them Can we improve performance by making smarter choices? If so, can we learn how to make these smarter choices?
review : sources : impact : strategy performance
![Page 17: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/17.jpg)
17
Impact on system performance
Logistic regression model Task Success % Non-understandings per session
Natives are more likely to succeed at the same non-understandings rate (Participants in the wizard condition also)
2nd model (also use Misunderstandings) Task success % Non + % Mis Better fit Adding native information does not improve model Non-u on average half as costly
review : sources : impact : strategy performance
![Page 18: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/18.jpg)
18
Issues under Investigation
Data Collection Detection / Diagnosis
What are the main causes (sources) of non-understandings? What is their impact on global performance? Can we diagnose non-understandings at run-time? Can we optimize the rejection process in a more principled way?
Set of recovery strategies What is the relative performance of different recovery strategies? Can we refine current strategies and find new ones?
Policy for choosing between them Can we improve performance by making smarter choices? If so, can we learn how to make these smarter choices?
review : sources : impact : strategy performance
![Page 19: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/19.jpg)
19
Issues under Investigation
Data Collection Detection / Diagnosis
What are the main causes (sources) of non-understandings? What is their impact on global performance? Can we diagnose non-understandings at run-time? Can we optimize the rejection process in a more principled way?
Set of recovery strategies What is the relative performance of different recovery strategies? Can we refine current strategies and find new ones?
Policy for choosing between them Can we improve performance by making smarter choices? If so, can we learn how to make these smarter choices?
review : sources : impact : strategy performance
![Page 20: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/20.jpg)
20
REPROMPT
NOTIFY
MOVE-ON
HELP
REPEAT
Non-understanding StrategiesS: For when do you need the room?U: [non-understanding]
1. MOVE-ON (MOVE) Sorry, I didn’t catch that. For which day you need the room?
2. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am …3. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am …
4. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say something like tomorrow at 10 am …5. ASK REPEAT (AREP) Could you please repeat that?6. ASK REPHRASE (ARPH) Could you please try to rephrase that?7. NOTIFY (NTFY) Sorry, I didn’t catch that ...8. YIELD TURN (YLD) …9. REPROMPT (RP) For when do you need the conference room?
10. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation …
review : sources : impact : strategy performance
![Page 21: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/21.jpg)
21
How to evaluate performance?
Recovery Next turn is okay (not a non-understanding, not a
misunderstanding)
Finer-grained recovery Next turn CER Next turn concept transfer (dialog cost)
Time (+recovery) ?? Time lost: 0 if next turn okay, time lost otherwise Time to recovery (has some problems) [More stuff under construction]
review : sources : impact : strategy performance
![Page 22: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/22.jpg)
22
Which strategies are better?
review : sources : impact : strategy performance
![Page 23: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/23.jpg)
23
Which strategies are better?
Recovery performance ranked list, based on pair-wise t-tests:
RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD
MOVE 1 MOVE: - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06
HELP 2 HELP: - - - - - - 1.55 1.64 1.73 1.87
HELP 3 TYCS: - - - - - - 1.5 1.58 1.68 1.81
SIG 4 RP: - - - - - - - - 1.46 1.58
HELP 5 YCS: - - - - - - - - 1.44 1.55
SIG 6 ARPH: - - - - - - - - 1.42 1.53
SIG ? DRP: - - - - - - - - - -
SIG ? NTFY: - - - - - - - - - -
SIG ? AREP: - - - - - - - - - -
SIG ? YLD: - - - - - - - - - -
CER evaluation shows similar results
review : sources : impact : strategy performance
![Page 24: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/24.jpg)
24
Which strategies are better?
MoveOn ≥ Help > Signal
RANK MOVE C1_HELP C1_SIG
1 MOVE - 1.19* 1.65
2 C1_HELP - - 1.38
3 C1_SIG - - -
* p = 0.1089
review : sources : impact : strategy performance
![Page 25: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/25.jpg)
25
What is the Impact on User Response?
Labeled user responses in 5 classes:[same tagging scheme as Shin, Choularton]
Answer (1st) Repeat Rephrase Change Contradict Other Hang-up
review : sources : impact : strategy performance
![Page 26: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/26.jpg)
26
What is the Impact on User Response?
Labeled user responses in 5 classes:[same tagging scheme as Shin, Choularton]
Answer (1st) Repeat Rephrase Change Contradict Other Hang-up
17.95%
44.30%
30.70%
3.63%
3.13%
review : sources : impact : strategy performance
![Page 27: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/27.jpg)
27
Comparing with other systems
review : sources : impact : strategy performance
![Page 28: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/28.jpg)
28
What responses are the best?
Recovery as a function of response type
Answer (1st) Repeat Rephrase Change Contradict Other Hang-up
45.45%
39.33%
63.29%
19.05%
review : sources : impact : strategy performance
![Page 29: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/29.jpg)
29
More to come …
Per-strategy analysis Barge-in & impact on recovery
review : sources : impact : strategy performance
![Page 30: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/30.jpg)
30
Issues under Investigation
Data Collection Detection / Diagnosis
What are the main causes (sources) of non-understandings? What is their impact on global performance? Can we diagnose non-understandings at run-time? Can we optimize the rejection process in a more principled way?
Set of recovery strategies What is the relative performance of different recovery strategies? Can we refine current strategies and find new ones?
Policy for choosing between them Can we improve performance by making smarter choices? If so, can we learn how to make these smarter choices?
review : sources : impact : strategy performance
![Page 31: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/31.jpg)
31
Refining the current set of strategies
Introduce more alternative dialog plans opportunities for Move-On
“You Can Say” Carefully tune the prompts Smarter barge-in control “All You Can Say”
“Speak shorter” Anecdotal evidence to be corroborated by analysis
“Speak louder / go to a quieter place” Not so much in these experiments, but evidence from
Let’s go! More prevention measures
If someone has troubles, you can give the YCS prompts without waiting for a non-understanding to happen
review : sources : impact : strategy performance
![Page 32: Sorry, I didn’t catch that …](https://reader033.vdocuments.site/reader033/viewer/2022061514/568144fe550346895db1c9c9/html5/thumbnails/32.jpg)
32
Thank You!!
Data Collection Detection / Diagnosis
What are the main causes (sources) of non-understandings? What is their impact on global performance? Can we diagnose non-understandings at run-time? Can we optimize the rejection process in a more principled way?
Set of recovery strategies What is the relative performance of different recovery strategies? Can we refine current strategies and find new ones?
Policy for choosing between them Can we improve performance by making smarter choices? If so, can we learn how to make these smarter choices?
review : sources : impact : strategy performance