sorry, i didn’t catch that! – an investigation of non-understandings and recovery strategies dan...
Post on 19-Dec-2015
216 views
TRANSCRIPT
sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies
Dan Bohus www.cs.cmu.edu/~dbohusAlexander I. Rudnicky www.cs.cmu.edu/~air
Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA, 15213
2
systems often do not understand correctly
S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]
NON-understanding
System cannot extract any meaningful information from the user’s turn
S: What city are you leaving from?U: Birmingham [BERLIN PM]
System extracts incorrect information from the user’s turn
MIS-understanding
non-understandings and misunderstandings
3
systems often do not understand correctly
S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]
NON-understanding
System cannot extract any meaningful information from the user’s turn
detection
strategies
policy (knowing how to engage the strategies)
large space of strategies tradeoffs between them not well understood
typically trivial; although diagnosis is not
simple heuristics: “incremental prompting”
4
questions under investigation
what are the main causes of non-understandings?
how large is their impact on performance?
how do various recovery strategies compare to each other?
what are the relationships between strategies and user behaviors?
can we improve global dialog performance by using a smarter policy?
if yes, can we learn a better policy from data?
data
5
data collection
Roomline phone-based, mixed-initiative system conference room reservations
experimental design control group: uninformed recovery policy wizard group: recovery policy implemented by
wizard
46 participants, first-time users tasks & experimental procedure
up to 10 scenario-driven interactions
6
non-understanding recovery strategiesS: For when do you need the conference room?1. ASK REPEAT Could you please repeat that?2. ASK REPHRASE Could you please try to rephrase that?3. NOTIFY (NTFY) Sorry, I didn’t catch that ...4. YIELD TURN (YLD) …5. REPROMPT (RP) For when do you need the conference room?6. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … 7. MOVE-ON Sorry, I didn’t catch that. For which day you need the room?8. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am …9. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am …10. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say
something like tomorrow at 10 am …
7
corpus statistics
449 sessions 8278 user turns utterances transcribed and checked manual annotations
misunderstandings correct concept values at each turn sources of understanding errors user response-types to recovery strategies
8
questions under investigation
data
what are the main causes of non-understandings?
how large is their impact on performance?
how do various recovery strategies compare to each other?
what are the relationships between strategies and user behaviors?
9
causes of non-understandings
conversationlevel
intentionlevel
signallevel
channellevel chann
el
Recognition
Parsing
Interpretation
End-pointing
Goal
Semantics
Text
Audio
user system
10
causes of non-understandings
conversationlevel
intentionlevel
signallevel
channellevel
out-of-application16%
out-of-grammar16%
ASR error62%
endpointer error
11
questions under investigation
data
what are the main causes of non-understandings?
how large is their impact on performance?
how do various recovery strategies compare to each other?
what are the relationships between strategies and user behaviors?
data : causes of non-understandings : impact on performance : strategy comparison : user behaviors
12
1 + e-(α + β·FNON)
logistic regression
P(Task Success) =
modeling impact on performance
0 10% 20% 30% 40% 50%0
0.2
0.4
0.6
0.8
1
% Nonunderstandings (FNON)
P(T
as
k S
uc
ce
ss
= 1
)1
13
questions under investigation
data
what are the main causes of non-understandings?
how large is their impact on performance?
how do various recovery strategies compare to each other?
what are the relationships between strategies and user behaviors?
data : causes of non-understandings : impact on performance : strategy comparison : user behaviors
14
strategy performance – recovery rate
overall logistic ANOVA significant differences in mean recovery rates
all pairs comparison (corrected using FDR)
0%
10%
20%
30%
40%
50%
60%
70%
80%
Re
co
ve
ry
ra
te
MoveOnHelp
TerseYouCanSay
ReProm
pt
YouCanSay
AskRephra
se
Detaile
dReprom
pt
Notify
AskRepeat
Yield
reco
very
rate
15
questions under investigation
data
what are the main causes of non-understandings?
how large is their impact on performance?
how do various recovery strategies compare to each other?
what are the relationships between strategies and user behaviors?
data : causes of non-understandings : impact on performance : strategy comparison : user behaviors
16
user response types
tagging scheme by Shin also used by Choularton, Raux
5 categories repeat rephrase contradict change other
17
50%
40%
30%
20%
10%
response types after non-understaning
0%
rephrase repeat contradict change other
Pizza (choularton & dale)
Communicator (Shin et al.)
Roomline (this study)
18
user response types by strategy
MoveOnHelp
TerseYouCanSay
RePrompt
YouCanSay
AskRephrase
DetailedReprompt
Notify
AskRepeat
Yield
Rephrase
Change
Repeat
Other
100%
80%
60%
40%
20%
0%
19
sources of non-understandings
impact on performance
strategy comparison
user responses
summary
can we improve global dialog performance by using a smarter policy?
can we learn a better policy from data?
asr, but also “language” errors → more shaping strategies …
regression model allows better quantitative assessment
help, “move-on” → further investigate “move-on”
margin for improving control over user responses
yes
preliminary results promising …
21
rejections
Figure 3. Misunderstandings and non-understandings before and after rejections
0 20% 40% 60% 80% 100%
Misunderstandings
Non-understandingsCorrect understandings
Before rejectionmechanism
After rejectionmechanism
False rejectionsCorrect rejections
22
strategy performance assessment recovery rate recovery utility
weighted sum of correctly and incorrectly acquired concepts
weights are determined in a data-driven fashion
recovery efficiency also takes time to recovery into account
23
experimental design: scenarios 10 scenarios, fixed order presented graphically (explained during briefing)
24
strategy pair-wise comparison recovery performance ranked list, based on
pair-wise t-tests:
RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD
MOVE 1 MOVE: - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06
HELP 2 HELP: - - - - - - 1.55 1.64 1.73 1.87
HELP 3 TYCS: - - - - - - 1.5 1.58 1.68 1.81
SIG 4 RP: - - - - - - - - 1.46 1.58
HELP 5 YCS: - - - - - - - - 1.44 1.55
SIG 6 ARPH: - - - - - - - - 1.42 1.53
SIG ? DRP: - - - - - - - - - -
SIG ? NTFY: - - - - - - - - - -
SIG ? AREP: - - - - - - - - - -
SIG ? YLD: - - - - - - - - - -
CER evaluation shows similar results
25
recovery for various response-types
Repeat Rephrase Change Other0
10%
20%
30%
40%
50%
60%
70%
80%R
ec
ov
ery
ra
te