spoken language generation: information presentation johanna d. moore university of edinburgh human...

40
Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Upload: abel-phillips

Post on 16-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Spoken Language Generation:Information Presentation

Johanna D. Moore

University of EdinburghHuman Communication Research Centre

School of Informatics

Page 2: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Introduction

• Much of the research in SDS has focused on the information gathering phase of the dialogue

• Communicator evaluation indicates that task duration is negatively correlated with user satisfaction (r = -0.31, p < .001)

(Walker, Passonneau, Boland, ACL 2001)

• What contributes most to dialogue duration?

Page 3: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Communicator 2001 Dataset

System User

Utterances 56% 44%

System Req_info Present_Info Other

Utterances 43% 25% 32%Time 31% 54% 15%Words 28% 50% 22%

Time 69% 31%Words 91% 9%

Page 4: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Typical information presentationSYS: I found 9 round trips from Hartford to Orlando on 31st and returning

Hartford on November 6th. For option 1, I have an outbound flight with US Air departing at 7 AM and arriving at 9 49 AM non-stop. A return flight is with U S Air departing at 10 30 AM and arriving at 12 28 AM with 1 stopover. Ticket price is 165 dollars. Please say “next option”, or, “flight details”, or, “I’ll take it”.

USER: NEXT OPTION

SYS: For option 2, I have an outbound flight with U S Air departing at 3 30 PM and arriving at 6 19 PM nonstop. A return flight is with U S Air departing at 8 25 PM and arriving at 12 28 AM with 1 stopover. Ticket price is 165 dollars. Say “next option”, or, “flight details”, or, “I’ll take it”.

USER: NEXT OPTION

SYS: . . .

Page 5: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics
Page 6: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Current spoken dialogue systems

• Presents each options and selected attributes individually

No comparisons

• Relies on users to remember trade-offs between multiple options

Places large memory burden on users

• Potentially large payoff for improvements in information presentation

Page 7: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Improving Information Presentation

• Apply techniques from text generation Allows customization to user(s) and discourse history

Improve quality of synthesis by using NL generator to provide info about both meaning and linguistic structure of utterance

• Overview of talk: Case study: FLIGHTS system

Statistical approaches to generation

See also Computer Speech and Language (2002) 16. Special Issue on Spoken Language Generation

Page 8: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

What NLG Can Do For You…

User: I want to travel from Edinburgh to Brussels, arriving by 5 pm.

System: There’s a direct flight on BMI with a good price. It arrives at four ten p.m. and costs one hundred and twelve pounds. The cheapest flight is on Ryanair. It arrives at two p.m. and it costs just fifty pounds, but you’d need to connect in Dublin.

System: You can fly business class on British Airways, arriving at four twenty p.m., but you’d need to connect in London Heathrow. There is a direct flight on BMI, arriving at four ten p.m., but there’s no availability in business class.

For astarvingstudent

For abusinesstraveller

Page 9: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

SentenceRealizer

Sentence Planner

English

Knowledge Sources

CommGoals

Linguistic Knowledge

Sources

Discourse Strategies

Aggregation Rules

Referring Expression Generation Algorithm

Lexicon

Grammar

Dialogue History

Domain Model

User ModelText Plan

Sentence Plan(s)

ContentSelection

DiscoursePlanning

AggregationReferring Expression Gen

Lexical Choice

Text Planner

Page 10: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

FLIGHTS architecture

DialogueManager(DIPPER)

SemanticInterpretation

CommGoals

Natural Language

Understander(Word spotting)

TTS(Festiva

l)

ASR(HTK)

TextString

UserInput

TextPlanning(O-Plan)

Sentence

Planner(XSLT)

Realizer(OpenCCG)

SystemResponse

Response Generator

Text Stringw/ APMLMarkup

UserModel Flight DB

Content Selection

Page 11: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Customization happens everywhere

• Content selection: what flights and attributes to present to user

• Discourse planning: ordering of content, discourse relations

• Referring Expression Generation: e.g., The cheapest flight, the five-fifteen, a KLM flight

• Aggregation: grouping propositions into clauses and sentences, e.g.,

There’s a KLM flight arriving Brussels at ten to five, but business class is not available and you’d need to connect in Amsterdam

• Discourse cues: e.g., Although, because, but

• Scalar Adjectives: e.g., good price, just fifty pounds

Page 12: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Content Selection

• Need a domain (or genre) specific method for determining what to say

• In FLIGHTS: Rank options based on predicted utility for the

user Select all options whose value is over a threshold Select attributes that contribute most to value of

selected options

(Moore, Foster, Lemon & White, FLAIRS 2004,

Carenini & Moore, AI Journal, 2006)

Page 13: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Discourse Planning• Using discourse strategies for producing user-adapted

recommendations, comparisons

• Produces text plans consisting of basic dialogue acts and rhetorical relations

• Orders presentation of options

• Groups attributes into positive and negative lists for contrasts

• Selects attributes to identify flights a direct flight, the cheapest flight, the KLM flight

• Marks items as theme/rheme for information structure

Page 14: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Information Structure

• Theme/Rheme Theme: part of utterance that connects it to prior discourse Rheme: part of utterance that advances the discussion by

contributing novel information Theme and rheme phrases marked by distinctive

combinations of pitch accents and boundary tones

• Focus/Background Focus: words whose interpretations contribute to

distinguishing the theme or rheme from other contextually available alternatives; marked by pitch accents

Background: the unmarked parts of themes and rhemes

(Steedman 1991-2002)

Page 15: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Examples

• Ex 1:I know when the Ryanair flight LEAVES, but when does it

ARRIVE?

(The Ryanair flight ARRIVESfocus)theme (at FIVEfocus)rheme

L+H* LH% H* LL%

• Ex 2:I know the KLM flight arrives at FOUR, but which flight arrives at

FIVE?

(The RYANAIRfocus flight)rheme (arrives at FIVEfocus)theme

H* LL% L+H* LH%

Page 16: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Assigning theme/rheme in FLIGHTS

• First option all rheme

• Subsequent items: Identifying, contrastive information is theme Implements notion of an implicit Question Under

Discussion, e.g.,

After presenting a flight that’s not direct, there’s an implicit question: Are there any direct flights?

You can fly business class on British Airways, arriving at four twenty p.m., but you’d need to connect in Manchester. [There’s a DIRECT flight]theme on BMI, arriving at four ten p.m., but there’s no availability in business class.

Page 17: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Controlling Intonation with OpenCCG

• OpenCCG realizer adapts previous work on chart realization to CCG, enabling CCG’s unique accounts of coordination and intonation to be employed in NLG systems

• Uses information structure to determine types and locations of pitch accents and boundary tones

• Measures similarity of realizations to n-gram language model

• Treats agenda as priority queue ordered by n-gram scores

• Yields best-first anytime algorithm: returns best scoring realization at “any time”, for interactive applications

(White and Baldridge, EWNLG9 2003; White, INLG 2004; White, RLaC 2006)

Page 18: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

The cheapest(L+H*) flight(LH) is on Ryanair(H* LH). It arrives at two p.m

(H* LH) and it costs just fifty(H*) pounds(H* LH), but you’d need to

connect(H*) in Dublin(H* LL).

unit selection

limited domain with APML markup

Even though the first(L+H*) flight is not on BMI(L+H* LH), it is the

cheapest(H*) one available(LH).

unit selection

limited domain with APML markup

Examples

Page 19: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Q: I'd like a cheap flight from Frankfurt to Geneva, please. And I'd prefer to fly direct.

A: There's a direct flight on Lufthansa with a good price, arriving in Geneva at ten thirty nine am and it costs two hundred and fifty five pounds. The cheapest flight is on Air France arriving at one twenty five pm and it costs only one hundred and five pounds, but it requires a connection in Paris Charles de Gaulle.

limited domain

limited domain with APML markup

Examples

Page 20: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Is Tailoring Effective?

Evaluation in MATCH Project:

• Restaurant recommendation system built using same user modeling techniques

• Subjects heard dialogues where recommendations and comparisons were based on own user model or random other model

• Subjects judge tailored responses significantly higher Information quality: System’s response is easy to understand

and provides exactly the information I am interested in when choosing a restaurant.

Ranking confidence: Recommended restaurant is somewhere I would like to go.

(Walker, Whittaker, Stent, Maloor, Moore, Johnson, Vassiredy, Cognitive Science 28, 2004)

Page 21: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Does Intonation Matter?

• Affects meaning “She only ATE the banana” vs. “She only ate the BANANA”

• Human judgements of output in travel domain show that overall German speech produced with GToBI markup judged better than default intonation (Kruijff-Korbayova, EACL03)

• Naturalness

• (Ease of comprehension)

Page 22: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Evaluation

• Compared three synthesizers Unit Selection Multisyn Limited Domain Limited Domain APML

• Hypotheses: LD_APML >> USM LD_APML > LD

(Neide Franca Rocha, MSc, 2004)

Page 23: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Results: US vs. LD_APML

73

221

0

50

100

150

200

250

US LD_APML

Nu

mb

er C

hos

en

Page 24: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Results: LD vs. LD_APML

100

173

0

20

40

60

80

100

120

140

160

180

200

LD LD_APML

Nu

mb

er C

hos

en

Page 25: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Using N-gram LM in Generation

190

34

0

20

40

60

80

100

120

140

160

180

200

n-gram no n-gram

Nu

mb

er C

hos

en

Page 26: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

UM Approach to Info Pres

+ UM provides information users want to make choices with high confidence

+ Enables concise presentation of options and their tradeoffs

+ Users prefer recommendations tailored to their model

- Doesn’t scale to large number of options

- Does not provide users with an overview of options

- Users may (perceive) that they’ve missed out on options

Page 27: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Summarize-and-Refine Approach

• Clusters options in database Based on attributes that yield smallest

number of clusters

• Summarize these clusters enumerate, majority, count, singleton

• User provides additional constraints

(Polifroni, Chung & Seneff, Eurospeech 2003, Chung ACL 2004)

Page 28: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

ExampleUser: What restaurants do you have?

Sys: I found 983 restaurants. Most of them are located in Boston and Cambridge. There are 32 choices for cuisine. I also have information about price range.

User: Okay tell me about the ones in Boston.

Sys: I found 401 restaurants in Boston. There are 29 choices for cuisine.

User: Do you have any that serve seafood?

Sys: I have found 19 seafood restaurants. They are predominantly in the Back Bay, the North End, South Boston and the South End.

User: In the South End.

Sys: I have found 3 restaurants. Grill Fish on …

Page 29: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

SR Approach:

+ Useful when users don’t know the structure of the database

+ Domain independent no fixed set of attributes make dynamic decisions based on analysis of data

- Suboptimal choice of attribute(s) for summarization

- Exploration of tradeoffs difficult- May include irrelevant entities

Page 30: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

The Combined UM+SR Approach

• select relevant options

• structure them based on ranking of attributes in user model

• automatically determine tradeoffs

• tailor summaries

• improve overview of options space by briefly summarizing irrelevant options

(Demberg and Moore, EACL 2006)

Page 31: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Example Dialogue Turn

User: I'd like to book a flight from Edinburgh toBrussels for tomorrow.

Sys: Ryan Air offers the cheapest flights to Brussels. They cost just 49 pounds but you need to connect in Dublin. There are two flights to choose from. There's an early flight leaving Edinburgh at 8:05 am and arriving at 12:45 pm. To leave later, you can take the 1pm flight arriving Brussels at 5:30

pm.

If you want to fly direct, there's a flight on BMI that leaves Edinburgh at 12 pm. It arrives at 1:35 pm and costs 112 pounds.

All other flights are more expensive.

User Profile “student”:

1 price2 number of legs departure time arrival time travel time6 airline fare class layover airport

Page 32: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Evaluation

Within-participants laboratory experiment 38 subjects Participants read 6 dialogue pairs (UM+SR vs. SR)

Four questions after each dialogue:1. Did the system give the information in a way that was easy to

understand?

2. Did the system give X a good overview of the available options?

3. Do you think there were better options the system did not tell X about?

4. How quickly did the system allow X to find the optimal flight?

Forced-choice question after each pair: Which system would you recommend to a friend?

Page 33: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Results - Forced Choice Q.

0

40

80

120System Preference

p < 0.001 (two-tailed binomial test)

Page 34: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Results - Likert Scale Questions

Q2: Under-standability

Q3: Overview

Q4: Con-fidence

Q5: Quick access (1-3 scale)

1.00

2.00

3.00

4.00

5.00

6.00

7.00UM+SR SR

Mea

n Li

kert

Sca

le V

alue Significance

levels usingtwo-tailedpaired t-test

Q2: p = 0.97

Q3: p < 0.0001

Q4: p < 0.0001

Q5: p < 0.001

Page 35: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Exp 2: Overhearer mode

5.67

5.24

2.42

5.82

5.34

4.71

2.31

5.68

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Q2:

Understandability

Q3:

Overview

Q4:

Confidence

Q5: Quick

Access

Mea

n Li

kert

Sca

le V

alue

Significancelevels usingtwo-tailedpaired t-test

Q2: p = 0.24

Q3: p < 0.01

Q4: p < 0.002

Q5: p < 0.10

Page 36: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Summary

Integration of UM and Clustering allows system to

• navigate through a large set of optionsstructure options according to users'

valuationspresent relevant options only

• automatically present tradeoffs between options

Results in • increased overall user satisfaction• better overview of options• increased users' confidence in system

Page 37: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Learning Content Selection Rules

• Content selection rules for biographical summaries (Duboue & McKeown, EMNLP 2003)

Uses a corpus of textual biographies and corresponding frame-based knowledge representation

Anchor-based alignment of extracted facts with sentences in text corpus

Learns whether semantic unit should be included in biography

» Recall 94%, F-score 51%

Induce rules from included material

Page 38: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Learning Content Selection Rules

• Collective classification for content selection(Barzilay and Lapata, HLT/EMNLP 2005)

• Again, a binary classification task

• All candidates considered simultaneously

• Improves coherence because semantically related items often selected together

• Evaluation: Aligned newswire summaries of NFL games with database of events

» Recall 76.5%, F-score 60.15%

• Include chosen events in summary (as in extractive summarization)

Page 39: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Learning for Sentence Planning & Realization

• SPaRKy (Stent, Prasad & Walker, ACL 2004) Input: content plan, a set of dialogue acts

and rhetorical relations among them Learns sentence plans from set of human-

ranked training examples

• Oh & Rudnicky, CS&L, 2002 Produces surface realizations for sentence

plans based on n-gram statistics

• Achieves performance comparable to hand-crafted versions

Page 40: Spoken Language Generation: Information Presentation Johanna D. Moore University of Edinburgh Human Communication Research Centre School of Informatics

Credits: The FLIGHTS System

Fancy Linguistically Inspired Generation of Highly Tailored Speech

Rob ClarkSteve Conway

Mary Ellen FosterKallirroi Georgila

Oliver LemonMichael White

Thanks to:UK Engineering and Physical Science

Research Council