sorry, i didn’t catch that! – an investigation of non-understandings and recovery strategies dan...

27
sorry, I didn’t catch that! – an investigation of non- understandings and recovery strategies Dan Bohus www.cs.cmu.edu/~dbohus Alexander I. Rudnicky www.cs.cmu.edu/~air Computer Science Department Carnegie Mellon University Pittsburgh, PA, 15213

Post on 19-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

sorry, I didn’t catch that! – an investigation of non-understandings and recovery strategies

Dan Bohus www.cs.cmu.edu/~dbohusAlexander I. Rudnicky www.cs.cmu.edu/~air

Computer Science DepartmentCarnegie Mellon UniversityPittsburgh, PA, 15213

2

systems often do not understand correctly

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]

NON-understanding

System cannot extract any meaningful information from the user’s turn

S: What city are you leaving from?U: Birmingham [BERLIN PM]

System extracts incorrect information from the user’s turn

MIS-understanding

non-understandings and misunderstandings

3

systems often do not understand correctly

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]

NON-understanding

System cannot extract any meaningful information from the user’s turn

detection

strategies

policy (knowing how to engage the strategies)

large space of strategies tradeoffs between them not well understood

typically trivial; although diagnosis is not

simple heuristics: “incremental prompting”

4

questions under investigation

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

can we improve global dialog performance by using a smarter policy?

if yes, can we learn a better policy from data?

data

5

data collection

Roomline phone-based, mixed-initiative system conference room reservations

experimental design control group: uninformed recovery policy wizard group: recovery policy implemented by

wizard

46 participants, first-time users tasks & experimental procedure

up to 10 scenario-driven interactions

6

non-understanding recovery strategiesS: For when do you need the conference room?1. ASK REPEAT Could you please repeat that?2. ASK REPHRASE Could you please try to rephrase that?3. NOTIFY (NTFY) Sorry, I didn’t catch that ...4. YIELD TURN (YLD) …5. REPROMPT (RP) For when do you need the conference room?6. DETAILED REPROMPT (DRP) Right now I need to know the date and time for when you need the reservation … 7. MOVE-ON Sorry, I didn’t catch that. For which day you need the room?8. YOU CAN SAY (YCS) Sorry, I didn’t catch that. For when do you need the conference room? You can say something like tomorrow at 10 am …9. TERSE YOU CAN SAY (TYCS) Sorry, I didn’t catch that. You can say something like tomorrow at 10 am …10. FULL HELP (HELP) Sorry, I didn’t catch that. I am currently trying to make a conference room reservation for you. Right now I need to know the date and time for when you need the reservation. You can say

something like tomorrow at 10 am …

7

corpus statistics

449 sessions 8278 user turns utterances transcribed and checked manual annotations

misunderstandings correct concept values at each turn sources of understanding errors user response-types to recovery strategies

8

questions under investigation

data

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

9

causes of non-understandings

conversationlevel

intentionlevel

signallevel

channellevel chann

el

Recognition

Parsing

Interpretation

End-pointing

Goal

Semantics

Text

Audio

user system

10

causes of non-understandings

conversationlevel

intentionlevel

signallevel

channellevel

out-of-application16%

out-of-grammar16%

ASR error62%

endpointer error

11

questions under investigation

data

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors

12

1 + e-(α + β·FNON)

logistic regression

P(Task Success) =

modeling impact on performance

0 10% 20% 30% 40% 50%0

0.2

0.4

0.6

0.8

1

% Nonunderstandings (FNON)

P(T

as

k S

uc

ce

ss

= 1

)1

13

questions under investigation

data

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors

14

strategy performance – recovery rate

overall logistic ANOVA significant differences in mean recovery rates

all pairs comparison (corrected using FDR)

0%

10%

20%

30%

40%

50%

60%

70%

80%

Re

co

ve

ry

ra

te

MoveOnHelp

TerseYouCanSay

ReProm

pt

YouCanSay

AskRephra

se

Detaile

dReprom

pt

Notify

AskRepeat

Yield

reco

very

rate

15

questions under investigation

data

what are the main causes of non-understandings?

how large is their impact on performance?

how do various recovery strategies compare to each other?

what are the relationships between strategies and user behaviors?

data : causes of non-understandings : impact on performance : strategy comparison : user behaviors

16

user response types

tagging scheme by Shin also used by Choularton, Raux

5 categories repeat rephrase contradict change other

17

50%

40%

30%

20%

10%

response types after non-understaning

0%

rephrase repeat contradict change other

Pizza (choularton & dale)

Communicator (Shin et al.)

Roomline (this study)

18

user response types by strategy

MoveOnHelp

TerseYouCanSay

RePrompt

YouCanSay

AskRephrase

DetailedReprompt

Notify

AskRepeat

Yield

Rephrase

Change

Repeat

Other

100%

80%

60%

40%

20%

0%

19

sources of non-understandings

impact on performance

strategy comparison

user responses

summary

can we improve global dialog performance by using a smarter policy?

can we learn a better policy from data?

asr, but also “language” errors → more shaping strategies …

regression model allows better quantitative assessment

help, “move-on” → further investigate “move-on”

margin for improving control over user responses

yes

preliminary results promising …

20

thank you! questions …

21

rejections

Figure 3. Misunderstandings and non-understandings before and after rejections

0 20% 40% 60% 80% 100%

Misunderstandings

Non-understandingsCorrect understandings

Before rejectionmechanism

After rejectionmechanism

False rejectionsCorrect rejections

22

strategy performance assessment recovery rate recovery utility

weighted sum of correctly and incorrectly acquired concepts

weights are determined in a data-driven fashion

recovery efficiency also takes time to recovery into account

23

experimental design: scenarios 10 scenarios, fixed order presented graphically (explained during briefing)

24

strategy pair-wise comparison recovery performance ranked list, based on

pair-wise t-tests:

RNK MOVE HELP TYCS RP YCS ARPH DRP NTFY AREP YLD

MOVE 1 MOVE: - - - 1.31 1.33 1.35 1.71 1.8 1.91 2.06

HELP 2 HELP: - - - - - - 1.55 1.64 1.73 1.87

HELP 3 TYCS: - - - - - - 1.5 1.58 1.68 1.81

SIG 4 RP: - - - - - - - - 1.46 1.58

HELP 5 YCS: - - - - - - - - 1.44 1.55

SIG 6 ARPH: - - - - - - - - 1.42 1.53

SIG ? DRP: - - - - - - - - - -

SIG ? NTFY: - - - - - - - - - -

SIG ? AREP: - - - - - - - - - -

SIG ? YLD: - - - - - - - - - -

CER evaluation shows similar results

25

recovery for various response-types

Repeat Rephrase Change Other0

10%

20%

30%

40%

50%

60%

70%

80%R

ec

ov

ery

ra

te

26

27

impact of recovery rate on performance

0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%0

0.2

0.4

0.6

0.8

1

Non-understanding recovery rate

P(T

as

k S

uc

ce

ss

=1

)

1 + e-(α + β·RecoveryRate)

recovery = next turn is correctly understood

P(Task Success) = 1