higgins a spoken dialogue system for investigating error handling techniques jens edlund, gabriel...

1
HIGGINS A spoken dialogue system for investigating error handling techniques Jens Edlund, Gabriel Skantze and Rolf Carlson Scenario User: I want to go to the closest subway station. System: Ok, to the closest subway station. Can you describe where you are now? User: I have an ATM to my left and a pedestrian crossing in front of me. System: Can you see some trees to your right and a white building in front of you? User: Yes System: Ok, take left after the large building which you have on your left and follow the street until you reach a crossing. User: Ok, there is a bus station here. System: That’s right. Take left again after the bus station. Centre for S peech Technology TT User utterance User reaction/re pair Assume understanding No recovery Non- understanding Assume understanding Architecture The Higgins Project Instigated in 2003 Theoretical goal: Investigate error handling techniques for collaborative dialogue systems Practical goal: Build a system in which these can be tested empirically This poster presents the current stage of the project. Error recovery (Non-understanding) Error recovery •Map-task-like studies on human-human conversation using ASR in one direction: •Results show that humans tend not to signal non- understanding: •This leads to •Increased experience of task success •Faster recovery from non-understanding •Skantze, G. (2003). Exploring human error handling strategies: implications for spoken dialogue systems. Early error detection Grounding Late error detection Error recovery (Misunderstanding) Late error detection The need for late error detection is task dependent: • Sometimes not necessary: • Sometimes reference handing is sufficient: • For slots with multiple possible values, late error detection is necessary: (These also exemplify misunderstanding error recovery.) Grounding The amount of feedback from the system should at least depend on Confidence of understanding Consequence of misunderstanding The discourse modeller Unifies assertions and tracks referents Solves ellipses Solves anaphora Keeps track of who contributed which information: Early error detection KTH LVSCR Large-Vocabulary Probabilistic ASR Machine-learned error detection Rule-driven semantic/syntact ic error detection Rule-driven discourse error detection Which features could be used for detecting word level errors How are they operationalised? Initial tests with Memory-based and Transformation-based learning suggest: • Utterance context • Lexical information • Word confidences • Discourse history Skantze, G. & Edlund, J. (2004). Early error detection on word level. ASR post- processing PICKERING: Robust interpretati on Rule-based semantic parsing • Finds partial results with largest coverage • Allows insertions inside phrases • Allows non-agreement if necessary Evaluation results show robustness against inserted content words Skantze, G. & Edlund, J. (2004). Robust interpretation in the Higgins spoken dialogue system. ASR Utterance interpretat ion Discourse modelling Generation Decision making TTS • Distributed modular system • Goals: A module for every task that is reasonably well-defined Separation of the domain specific (XML) and the domain independent (module code) • Incremental processing allows for: Rapid feedback Flexible turn-taking Faster processing U1: I want to go to Boston S1: To London... U2: No, to Boston! U1: How much is the big apartment? S1: The small apartment is […] U2a: No, the big apartment! U2b: And the big apartment? U1: I have a large building on my left S1: A large building on your right U2a: No, on my left! U2b: And on my left Misunderstanding U1: There is a large red building S2: What material is the large building made of? O1: Do you see a wooden house in front of you? U1: YES CROSSING ADDRESS NOW (I pass the wooden house now) O2: Can you see a restaurant sign? Vocoder User Operator Listens Speaks Reads Speaks ASR GALATEA: Discourse modelling

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HIGGINS A spoken dialogue system for investigating error handling techniques Jens Edlund, Gabriel Skantze and Rolf Carlson Scenario User:I want to go to

HIGGINSA spoken dialogue system for investigating error handling techniques

Jens Edlund, Gabriel Skantze and Rolf Carlson

ScenarioUser: I want to go to the closest subway station.

System: Ok, to the closest subway station. Can you describe where you are now?

User: I have an ATM to my left and a pedestrian crossing in front of me.

System: Can you see some trees to your right and a white building in front of you?

User: Yes

System: Ok, take left after the large building which you have on your left and follow the street until you reach a crossing.

User: Ok, there is a bus station here.

System: That’s right. Take left again after the bus station.

Centre forSpeech Technology

TT

User utterance

User reaction/repair

Assume understanding

No recovery

Non-understanding

Assume understanding

Architecture

The Higgins Project• Instigated in 2003• Theoretical goal: Investigate error handling techniques for

collaborative dialogue systems• Practical goal: Build a system in which these can be tested

empirically• This poster presents the current stage of the project.

Error recovery(Non-understanding)

Error recovery•Map-task-like studies on human-human conversation using ASR in one direction:

•Results show that humans tend not to signal non-understanding:

•This leads to•Increased experience of task success•Faster recovery from non-understanding

•Skantze, G. (2003). Exploring human error handling strategies: implications for spoken dialogue systems.

Early error detection

Grounding

Late error detection

Error recovery(Misunderstanding)

Late error detectionThe need for late error detection is task dependent:• Sometimes not necessary:

• Sometimes reference handing is sufficient:

• For slots with multiple possible values, late error detection is necessary:

(These also exemplify misunderstanding error recovery.)

Grounding• The amount of feedback from the system should at

least depend on• Confidence of understanding• Consequence of misunderstanding

• The discourse modeller• Unifies assertions and tracks referents• Solves ellipses • Solves anaphora• Keeps track of who contributed which

information:

Early error detection

KTH LVSCR

Large-Vocabulary Probabilistic ASR

Machine-learned error detection

Rule-driven semantic/syntactic error detection

Rule-driven discourse error detection

• Which features could be used for detecting word level errors• How are they operationalised?• Initial tests with Memory-based and Transformation-based

learning suggest:• Utterance context• Lexical information• Word confidences• Discourse history

• Skantze, G. & Edlund, J. (2004). Early error detection on word level.

ASR post-processing

PICKERING:Robust

interpretation

• Rule-based semantic parsing• Finds partial results with largest coverage• Allows insertions inside phrases• Allows non-agreement if necessary

• Evaluation results show robustness against inserted content words

• Skantze, G. & Edlund, J. (2004). Robust interpretation in the Higgins spoken dialogue system.

ASR

Utterance interpretation

Discourse modelling

Generation Decision making

TTS

• Distributed modular system• Goals:

• A module for every task that is reasonably well-defined

• Separation of the domain specific (XML) and the domain independent (module code)

• Incremental processing allows for:• Rapid feedback• Flexible turn-taking• Faster processing

U1: I want to go to BostonS1: To London...U2: No, to Boston!

U1: How much is the big apartment?S1: The small apartment is […]U2a: No, the big apartment!U2b: And the big apartment?

U1: I have a large building on my leftS1: A large building on your rightU2a: No, on my left!U2b: And on my left

Misunderstanding

U1: There is a large red buildingS2: What material is the large building made

of?

O1: Do you see a wooden house in front of you?U1: YES CROSSING ADDRESS NOW

(I pass the wooden house now)O2: Can you see a restaurant sign?

Vocoder

User Operator

Listens Speaks

ReadsSpeaks ASR

GALATEA:Discourse modelling