misunderstandings, corrections and beliefs in spoken language interfaces dan bohus computer science...

48
misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department www.cs.cmu.edu/~dbohus Carnegie Mellon University [email protected] Pittsburgh, PA 15213

Post on 22-Dec-2015

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

misunderstandings, corrections and beliefs in spoken language interfaces

Dan Bohus Computer Science Departmentwww.cs.cmu.edu/~dbohus Carnegie Mellon [email protected] Pittsburgh, PA 15213

Page 2: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

2

problem

spoken language interfaces lack robustness when faced with understanding errors

stems mostly from speech recognition spans most domains and interaction types exacerbated by operating conditions

Page 3: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

3

more concretely …

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]S: Sorry I’m not sure I understood what you said. What city are you

leaving from ?U: Urbana Champaign [FOR MINUTE SINCE HEY]S: I’m still having trouble understanding you. To go on, I need you to

answer the following question. What city are you leaving from?U: Chicago [CHICAGO]S: traveling from Chicago. Where would you like to go?U: Huntsville [SEOUL]S: traveling to Seoul. What day did you need to travel?U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M]S: traveling in the afternoon. Okay what day would you be departing

chicagoU: /uh/ the tenth of august [AT THE TENTH OF AUGUST]S: a flight tomorrow… I have a Korean Airlines flight departing Chicago at

1:40pm, arrives Seoul at 5pm the next day. Do you want to take that?U: my destination is Birmingham [FLIGHT DESTINATION MR

WEEKEND]S: traveling on Saturday, August 12th … I have a flight departing Chicago

at 1:40pm arrives Seoul at ………

Page 4: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

4

some statistics …

corrections [Krahmer, Swerts, Litman, Levow]

30% of utterances correct system mistakes 2-3 times more likely to be misrecognized

semantic error rates: ~25-35%

SpeechActs [SRI] 25%

CU Communicator [CU] 27%

Jupiter [MIT] 28%

CMU Communicator [CMU] 32%

How May I Help You? [AT&T] 36%

Page 5: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

5

two types of understanding errors

S: What city are you leaving from?U: Urbana Champaign [OKAY IN THAT SAME PAY]

NON-understanding

System cannot extract any meaningful information from the user’s turn

S: What city are you leaving from?U: Birmingham [BERLIN PM]

System extracts incorrect information from the user’s turn

MIS-understanding

Page 6: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

6

misunderstandings

S: What city are you leaving from?U: Birmingham [BERLIN PM]

System extracts incorrect information from the user’s turn

MIS-understanding

detect potential misunderstandings; do something about them

fix recognition

Page 7: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

7

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating[construct accurate beliefs by integrating information from multiple turns]

Page 8: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

8

detecting misunderstandings

recognition confidence scoresS: What city are you leaving from?U: Birmingham [BERLIN PM]

conf=0.63

traditionally [Bansal, Chase, Cox, Kemp, many others]

speech recognition confidence scores use acoustic, language model and search info frame, phoneme, word-level

Page 9: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

9

“semantic” confidence scores

we’re interested in semantics, not words YES = YEAH, NO = NO WAY

use machine learning to build confidence annotators in-domain, manually labeled data

utterance: [BERLIN PM] Birmingham

labels: correct / misunderstood

features from different knowledge sources binary classification problem probability of misunderstanding: regression

problem

Page 10: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

10

a typical result

Identifying User Corrections Automatically in a Spoken Dialog System [Walker, Wright, Langkilde]

HowMayIHelpYou corpus: call routing for phone services 11787 turns

features ASR: recog, numwords, duration, dtmf, rg-grammar, tempo … understanding: confidence, context-shift, top-task, diff-conf, … dialog & history: sys-label, confirmation, num-reprompts,

num-confirms, num-subdials, …

binary classification task majority baseline (error): 36.5% RIPPER (error): 14%

Page 11: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

11

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating [construct accurate beliefs by integrating information from multiple turns]

Page 12: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

12

detect user corrections is the user trying to correct the system?

S: Where would you like to go?U: Huntsville [SEOUL]S: traveling to Seoul. What day did you need to travel?U: no no I’m traveling to Birmingham [THE TRAVELING TO BERLIN P_M]

user correction

misunderstanding

misunderstanding

same story: use machine learning in-domain, manually labeled data features from different knowledge sources binary classification problem probability of correction: regression problem

Page 13: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

13

typical result

Identifying User Corrections Automatically in a Spoken Dialog System [Hirschberg, Litman, Swerts]

TOOT corpus: access to train information 2328 turns, 152 dialogs

features prosodic: f0max, f0mn, rmsmax, dur, ppau, tempo

… ASR: gram, str, conf, ynstr, … dialog position: diadist dialog history: preturn, prepreturn, pmeanf

binary classification task majority baseline: 29% RIPPER: 15.7%

Page 14: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

14

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating[construct accurate beliefs by integrating information from multiple turns]

Page 15: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

15

belief updating problem: an easy case

S: on which day would you like to travel?U: on September 3rd

[AN DECEMBER THIRD] {CONF=0.25}

S: did you say you wanted to leave on December 3rd?

departure_date = {Dec-03/0.25}

departure_date = {Ø}

U: no

[NO] {CONF=0.88}

Page 16: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

16

belief updating problem: a trickier case

S: Where would you like to go?U: Huntsville

[SEOUL] {CONF=0.65}

S: traveling to Seoul. What day did you need to travel?

destination = {seoul/0.65}

destination = {?}

U: no no I’m traveling to Birmingham

[THE TRAVELING TO BERLIN P_M] {CONF=0.60} {COR=0.35}

Page 17: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

17

given: an initial belief Pinitial(C) over

concept C a system action SA a user response R

construct an updated belief: Pupdated(C) ← f (Pinitial(C), SA, R)

belief updating problem formalized

S: traveling to Seoul. What day did you need to travel?

destination = {seoul/0.65}

destination = {?}

[THE TRAVELING TO BERLIN P_M] {CONF=0.60} {COR=0.35}

Page 18: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

18

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating [construct accurate beliefs by integrating information from multiple turns] current solutions a restricted version data user response analysis experiments and results discussion. caveats. future work

Page 19: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

19

belief updating: current solutions

most systems only track values, not beliefs

new values overwrite old values explicit confirm + yes → trust hypothesis explicit confirm + no → kill hypothesis explicit confirm + “other” → non-understanding implicit confirm: not much

“users who discover errors through incorrect implicitconfirmations have a harder time getting back on track”[Shin et al, 2002]

Page 20: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

20

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating [construct accurate beliefs by integrating information from multiple turns] current solutions a restricted version data user response analysis experiments and results discussion. caveats. future work

Page 21: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

21

belief updating: general form

given: an initial belief Pinitial(C) over concept C a system action SA a user response R

construct an updated belief: Pupdated(C) ← f (Pinitial(C), SA, R)

Page 22: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

22

restricted version: 2 simplifications

1. compact belief system unlikely to “hear” more than 3 or 4

values single vs. multiple recognition results

in our data: max = 3 values, only 6.9% have >1 value

confidence score of top hypothesis

2. updates after confirmation actions

reduced problem ConfTopupdated(C) ← f (ConfTopinitial(C), SA, R)

Page 23: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

23

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating [construct accurate beliefs by integrating information from multiple turns] current solutions a restricted version data user response analysis experiments and results discussion. caveats. future work

Page 24: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

24

I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one?

I found 10 rooms for Friday between 1 and 3 p.m. Would like a small room or a large one?

data

collected with RoomLine a phone-based mixed-initiative spoken dialog

system conference room reservation

search and negotiation

explicit and implicit confirmations confidence threshold model (+ some

exploration)

implicit confirmation task

Page 25: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

25

user study 46 participants, 1st time users 10 scenarios, fixed order presented graphically (explained during briefing)

compensated per task success

Page 26: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

26

corpus statistics

449 sessions, 8848 user turns orthographically transcribed manually annotated

misunderstandings (concept-level) non-understandings user corrections correct concept values

Page 27: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

27

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating [construct accurate beliefs by integrating information from multiple turns] current solutions a restricted version data user response analysis experiments and results discussion. caveats. future work

Page 28: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

28

user response types

following Krahmer and Swerts study on Dutch train-table information system

3 user response types YES: yes, right, that’s right, correct, etc. NO: no, wrong, etc. OTHER

cross-tabulated against correctness of confirmations

Page 29: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

29

user responses to explicit confirmations

YES NO Other

CORRECT94%

[93%]0% [0%] 5% [7%]

INCORRECT 1% [6%]72%

[57%]27%

[37%]~10%

from transcripts

[numbers in brackets from Krahmer&Swerts]

from decoded YES NO Other

CORRECT 87% 1% 12%

INCORRECT 1% 61% 38%

Page 30: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

30

other responses to explicit confirmations

~70% users repeat the correct value ~15% users don’t address the question

attempt to shift conversation focus

User does not correct

User corrects

CORRECT 1159 0

INCORRECT 29 [10% of incor]

250[90% of incor]

Page 31: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

31

user responses to implicit confirmations

YES NO Other

CORRECT30% [0%]

7% [0%]63%

[100%]

INCORRECT 6% [0%]33%

[15%]61%

[85%]

transcripts

[numbers in brackets from Krahmer&Swerts]

decodedYES NO Other

CORRECT 28% 5% 67%

INCORRECT 7% 27% 66%

Page 32: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

32

ignoring errors in implicit confirmations

User does not correct

User corrects

CORRECT 552 2

INCORRECT 118 [51% of incor]

111[49% of incor]

users correct later (40% of 118) users interact strategically

correct only if essential

~correct later

correct later

~critical 55 2

critical 14 47

Page 33: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

33

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating [construct accurate beliefs by integrating information from multiple turns] current solutions a restricted version data user response analysis experiments and results discussion. caveats. future work

Page 34: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

34

machine learning approach

need good probability outputs low cross-entropy between model

predictions and reality cross-entropy = negative average log

posterior

logistic regression sample efficient stepwise approach → feature selection

logistic model tree for each action root splits on response-type

Page 35: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

35

features. target.

initial situation initial confidence score concept identity, dialog state, turn number

system action other actions performed in parallel

features of the user response acoustic / prosodic features lexical features grammatical features dialog-level features

target: was the value correct?

Page 36: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

36

baselines

initial baseline accuracy of system beliefs before the update

heuristic baseline accuracy of heuristic rule currently used in

the system

oracle baseline accuracy if we knew exactly when the user is

correcting the system

Page 37: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

37

results: explicit confirmation

0

10

20

30

Hard

-err

or

(%)

0

0.2

0.4

0.6

So

ft-e

rro

r

InitialHeuristicLMTOracle

InitialHeuristicLMT

31.15

8.41

3.57 2.71

0.51

0.19

0.12

Explicit ConfirmationHard error (%) Soft error

Page 38: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

38

0

10

20

30

Hard

-err

or

(%)

0

0.2

0.4

0.6

0.8

1

So

ft-e

rro

r

InitialHeuristicLMT

InitialHeuristicLMTOracle

30.40

23.37

16.1515.33

0.610.67

0.43

Implicit Confirmation

results: implicit confirmation

Hard error (%) Soft error

Page 39: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

39

0

10

20

Hard

-err

or

(%)

0

0.2

0.4

0.6

So

ft-e

rro

r

InitialHeuristicLMT

InitialHeuristicLMTOracle15.40

14.3612.64

10.37

Unplanned Implicit Confirmation

0.430.46

0.34

results: unplanned implicit confirmation

Hard error (%) Soft error

Page 40: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

40

informative features

initial confidence score prosody features barge-in expectation match repeated grammar slots concept id priors on concept values [not included in these

results]

Page 41: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

41

outline

detecting misunderstandings

detecting user corrections[late-detection of misunderstandings]

belief updating [construct accurate beliefs by integrating information from multiple turns] current solutions a restricted version data user response analysis experiments and results discussion. caveats. future work

Page 42: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

42

discussion

evaluation does it make sense? what would be a better evaluation?

current limitation: belief compression extending models to N hypothesis + other

current limitation: system actions extending models to cover all system actions

Page 43: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

43

thank you!

Page 44: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

44

a more subtle caveat distribution of training data

confidence annotator + heuristic update rules

distribution of run-time data confidence annotator + learned model

always a problem when interacting with the world!

hopefully, distribution shift will not cause large degradation in performance remains to validate empirically maybe a bootstrap approach?

Page 45: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

45

KL-divergence & cross-entropy KL divergence: D(p||q)

Cross-entropy: CH(p, q) = H(p) + D(p||q)

Negative log likelihood

)(

)(log)()||(

xq

xpxpqpD

)(log)(),( xqxpqpCH

)(log)( xqqLL

Page 46: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

46

logistic regression regression model for binomial (binary) dependent

variables

fwefxP

1

1)|1( fw

xp

xp

)0(

)1(log

fit a model using max likelihood (avg log-likelihood) any stats package will do it for you

no R2 measure test fit using “likelihood ratio” test stepwise logistic regression

keep adding variables while data likelihood increases signif. use Bayesian information criterion to avoid overfitting

Page 47: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

47

logistic regression

0 10% 20% 30% 40% 50%0

0.2

0.4

0.6

0.8

1

% Nonunderstandings (FNON)

P(T

as

k S

uc

ce

ss

= 1

)

Page 48: misunderstandings, corrections and beliefs in spoken language interfaces Dan Bohus Computer Science Department dbohus Carnegie Mellon

48

logistic model tree

f

g

0 10% 20% 30% 40% 50%0

0.2

0.4

0.6

0.8

1

% Nonunderstandings (FNON)

P(T

as

k S

uc

ce

ss

= 1

)

0 10% 20% 30% 40% 50%0

0.2

0.4

0.6

0.8

1

% Nonunderstandings (FNON)

P(T

as

k S

uc

ce

ss

= 1

)

regression tree, but with logistic models on leaves

f=0 f=1

g>10g<=10