cost of misunderstandings modeling the cost of misunderstanding errors in the cmu communicator...

26
Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus ([email protected]) Work by: Dan Bohus, Alex Rudnicky Carnegie Mellon University, 2001

Post on 21-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

Cost of Misunderstandings

Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System

Presented by: Dan Bohus ([email protected])

Work by: Dan Bohus, Alex Rudnicky

Carnegie Mellon University, 2001

Page 2: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 2

Outline Quick overview of previous utterance-level

confidence annotation work. Modeling the cost of misunderstandings in

spoken dialog systems. Experiments & results. Further analysis. Summary, further work, conclusion

Page 3: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 3

Utterance-Level Confidence Annotation Overview Confidence annotation = data-driven classification

Corpus: 2 months, 131 dialogs, 4550 utterances.

Features: 12 features from decoder, parsing, dialog management levels.

Classifiers: Decision Tree, ANN, BayesNet, AdaBoost, NaiveBayes, SVM + Logistic Regression model (later on).

Page 4: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 4

Confidence annotator performance

Baseline error rate: 32 % Garble baseline: 25 % Classifiers performance: 16 %

Differences between classifiers are statistically insignificant except for Naïve Bayes

On a soft-metric, logistic regression model clearly outperformed the others

But is this the right way to evaluate performance?

Page 5: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 5

Judging Performance Classification Error Rate (FP+FN).

Assumes implicitly that FP and FN errors have same cost

But cost of misunderstanding in dialog systems is presumably different for FPs and FNs.

Build an error function which take into account these costs, and optimize for that.

Cost also depends on domain/system ~ not a problem dialog state

Page 6: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 6

Problem Formulation (1) Develop a cost model which allows us to quantitatively assess

the costs of FP and FN errors. (2) Use the costs to pick the optimal tradeoff point on the classifier

ROC.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Boosting threshold

Err

or

Ra

te

False NegativesFalse PositivesTotal Error

Page 7: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 7

The Cost Model Model the impact of the FPs and FNs on

the system performance Identify a suitable performance metric P Build a statistical regression model at the

dialog session level: P = f(FPs, FNs) P = k + CostFP*FP + CostFN*FN (Linear Regr)

Then we can plot f, and implicitly optimize for P

Page 8: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 8

Measuring Performance User Satisfaction (i.e. 5-point scale)

Hard to get Very subjective ~ hard to make it consistent

across users

Concept transfer efficiency: CTC: correctly transferred concepts per turn ITC: incorrectly transferred concepts per turn

Completion

Page 9: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 9

Detour : The Dataset 134 dialogs (2561 utterances), collected

using 4 scenarios Satisfaction scores only for 35 dialogs Corpus manually labeled at the concept

and level 4 labels: OK / RBAD / PBAD / OOD Aggregate utterance labels generated

Confidence annotator decisions logged Computed counts of FPs, FNs, CTCs,

ITCs for each session

Page 10: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 10

Example

U: I want to fly from Pittsburgh to Boston

S: I want to fly from Pittsburgh to Austin C: [I_want/OK] [Depart_Loc/OK]

[Arrive_Loc/RBAD]

Only 2 relevantly expressed concepts If Accept: CTC = 1, ITC = 1 If Reject: CTC = 0, ITC = 0

Page 11: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 11

Targeting Efficiency: Model 1

3 Successively refined models

CTC = FP + FN + TN + k CTC - correctly transferred concepts / turn TN – true negatives

Model R2 all R2 train R2 test

CTC=FP+FN+TN 0.81 0.81 0.73

Page 12: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 12

Targeting Efficiency: Model 2

CTC - ITC = (REC +) FP + FN + TN + k ITC - incorrectly transferred concepts /

turn REC – relevantly expressed concepts

Model R2 all R2 train R2 test

CTC=FP+FN+TN 0.81 0.81 0.73

CTC-ITC=FP+FN+TN 0.86 0.86 0.78

CTC-ITC=REC+FP+FN+TN 0.89 0.89 0.83

Page 13: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 13

Targeting Efficiency: Model 3

CTC-ITC = REC+FPC+FPNC+FN+TN+k 2 types of FPs:

With concepts - FPC Without concepts - FPNC

Model R2 all R2 train R2 test

CTC=FP+FN+TN 0.81 0.81 0.73

CTC-ITC=FP+FN+TN 0.86 0.86 0.78

CTC-ITC=REC+FP+FN+TN 0.89 0.89 0.83

CTC-ITC =REC+FPC+FPNC+FN+TN

0.94 0.94 0.90

Page 14: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 14

Model 3 - Results

k 0.41

CREC 0.62

CFPNC -0.48

CFPC -2.12

CFN -1.33

CTN -0.55

CTC-ITC = REC+FPC+FPNC+FN+TN+k

Page 15: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 15

Other models

Completion (binary) Logistic regression model Estimated model does not indicate a good fit

User satisfaction (5-point scale) Based on only 35 dialogs R2 = 0.61 (similar to literature – Walker et al) Explanation: subjectivity of metric + limited

dataset

Page 16: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 16

Problem Formulation (1) Develop a cost model which allows us to

quantitatively assess the costs of FP and FN errors.

(2) Use the costs to pick the optimal tradeoff point on the classifier ROC.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Boosting threshold

Err

or

Ra

te

False NegativesFalse PositivesTotal Error

Page 17: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 17

Tuning the Confidence Annotator

Using Model 3 CTC-ITC = REC+FPNC+FPC+FN+TN+k Drop k & REC, plug in the values Cost = 0.48FPNC+2.12FPC+1.33FN+0.56TN

Minimize Cost instead of Classification Error Rate (FP+FN), and we’ll implicitly maximize concept transfer efficiency.

Page 18: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 18

Operating Characteristic

Page 19: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 19

Further Analysis

Is CTC-ITC really modeling dialog performance ?

Mean = 0.71, Std.Dev = 0.28 Mean for completed dialogs = 0.82 Mean for uncompleted dialogs = 0.57 Difference between means significant at

very high level of confidence P-value = 7.23*10-9 (in t-test)

So, looks like CTC-ITC is okay, right ?

Page 20: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 20

Further Analysis (cont’d)

Can we reliably extrapolate to other areas of the operating characteristic ?

Page 21: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 21

Further Analysis (cont’d)

Can we reliably extrapolate to other areas of the operating characteristic ?

Yes, look at the distribution of the FP and FN ratios across dialogs.

Page 22: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 22

Further Analysis (cont’d)

Impact of baseline error rate ? Compared models constructed based on

high and low error rates: For low error rate curve becomes

monotonically increasing This clearly indicates that “trust everything /

have no confidence ” is the way to go in this setting

Page 23: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 23

Our explanation so far…

Ability to easily overwrite incorrectly captured information in the CMU Communicator

Relatively low error rates Likelihood of repeated misrecognition is low

Page 24: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 24

Conclusion

Data-driven approach to quantitatively assess the costs of various types of misunderstandings.

Models based on efficiency fit data well; obtained costs confirm intuition.

For CMU Communicator, model predicts that total cost stays the same across a large range of the operating characteristic of the classifier.

Page 25: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 25

Further Experiments But, of course, we can verify predictions

experimentally Collect new data with the system running

with a very low threshold. 55 dialogs collected so far. Thanks to those who have participated in

these experiments. “Help if you have the time” to the others …

www.cs.cmu.edu/~dbohus/scenarios.htm

Re-estimate models, verify predictions

Page 26: Cost of Misunderstandings Modeling the Cost of Misunderstanding Errors in the CMU Communicator Dialog System Presented by: Dan Bohus (dbohus@cs.cmu.edu)

11-04-01 Modeling the cost of misunderstanding … 26

Confusion Matrix

OK BAD

System says OK TP FP

System says BAD FN TN

FP = False acceptance FN = False detection/rejection Fallout = FP/(FP+TN) = FP/NBAD CDR = 1-Fallout = 1-(FP/NBAD)