grounded neural conversational models · •natural-sounding conversations about a shared image...

25
Michel Galley and Lucy Vanderwende Grounded Neural Conversational Models

Upload: others

Post on 02-Feb-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Michel Galley and Lucy Vanderwende

Grounded Neural Conversational Models

Page 2: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Collaborators

Jiwei Li

Stanford

Nasrin Mostafazadeh

U. RochesterMarjan Ghazvininejad

USC/ISI

Alan Ritter

Ohio State U.

Yi Luan

U. WashingtonAlessandro Sordoni

Microsoft

Bill Dolan

Microsoft

Jianfeng Gao

Microsoft

Chris Quirk

Microsoft

Chris Brockett

Microsoft

Scott Yih

Microsoft

Ming-Wei Chang

Microsoft

Page 3: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Goal: Learning to converse

• Seamless and natural

• Open domain

• Open ended and free form(chitchat, informational, …)

Teach machines toengage in conversations

I gotta get out of the house, any recommendations?

Yes it is so sunny! It should stay that way for the rest of the weekend.

Try Mount Rainier. People say it’s beautiful in summer.

The weather is gorgeous today!

Page 4: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Fully Data-Driven Conversation

[Ritter et al., 2011; Sordoni et al., 2015; Vinyals and Le, 2015; Shang et al., 2015; etc.]

Source:

conversation history

Target:

response

Our best model trained with

~140 million conversations

Page 5: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

NOT grounded

Dialog Systems: Two paradigms

Understanding

(NLU)State tracker

Generation

(NLG)Dialog policy

input x

output ySta

nd

ard

calendar

Grounded

input x

output y

Fu

lly d

ata

-dri

ven

Environment

Page 6: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

A Knowledge-Grounded Neural Conversation Model

ht

Going to

Kusakabe tonight

CONVERSATION HISTORY

Try omakase, the

best in town

RESPONSE

ht DECODERDIALOG

ENCODER

...

WORLD

“FACTS”

A

...CONTEXTUALLY-RELEVANT

“FACTS”

Consistently the best omakase

Amazing sushi tasting […]

They were out of kaisui […]

FACTS

ENCODER

[Ghazvininejad

et al., 2017]

Page 7: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

“Infusing” non-conversational knowledge into conversations

You know any good Japanese restaurant in Seattle?

Try Kisaku, one of the best sushi restaurants in the city.

You know any good Arestaurant in B?

Try C, one of the best D in the city.

Page 8: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Sample knowledge-grounded responses

Results w/ 23M conversations: outperforms competitive neural baseline (including on human eval)

Page 9: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Data-driven conversation:toward more informational and “useful” dialogs

Standard dialog systems

(grounded)

chitchat informational,

task-completionFully data-driven

(previously ungrounded)

[Ritter et al., 2011, Sordoni et al., 2015;

Vinyals and Le, 2015; Shang et al., 2015;

Li et al., 2016; …]

[Ghazvininejad

et al., 2017]

GROUNDED!

Page 10: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Grounded and Fully Data-Driven Models

Personalization data

(ID, social graph, ...)

Device sensors

(GPS, vision, ...)

[Li et al., 2016]

[Ghazvininejad et al., 2017]

[Luan et al., 2017]

[Mostafazadeh et al., 2017]

Page 11: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Conversation

Page 12: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Question Generation

• Generating Natural Questions About an Image, ACL 2016

• Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiadong He, Lucy Vanderwende

Page 13: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Image Grounded Conversation

Did he end up winning the race?

Yes he won, he can’t believe it.

Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, Lucy Vanderwende – arXiv

Page 14: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response

• IGC crowd dataset• 4,222 conversations (avg 4 turns)

• 5 additional questions and first responses per conversationfor evaluation

• Sourced using CrowdChip

Image Grounded Conversation Datasets

Page 15: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response

• IGC crowd dataset• 4,222 conversations (avg 4 turns)

• 5 additional questions and first responses per conversationfor evaluation

• Sourced using CrowdChip

Image Grounded Conversation Datasets

Page 16: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response

• IGC crowd dataset• 4,222 conversations (avg 4 turns)

• 5 additional questions and first responses per conversationfor evaluation

• Sourced using CrowdChip

Image Grounded Conversation Datasets

Page 17: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

• Natural-sounding conversations about a shared image

• Conversation topics are the events and actions that are evoked by the objects in the image

• Both Image and Textual context are informative when generating the question

Key characteristics of IGC dialogue

Page 18: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

• 32% of questions are linked to the image frame

• 47% of questions are linked to the textual context frame

• 14% of cases are the image and the textual context frame the same

FrameNet analysis of dialogue

Page 19: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

• Natural-sounding conversations about a shared image

• Conversation topics are the events and actions that are evoked by the objects in the image

• Both Image and Text-context are informative when generating the question

• Complex temporal and causal relations are observed across multiple turns in the conversation, as one would expect from natural conversation

Key characteristics of dialogue

Page 20: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Temporal and causal relations across turns

• Of 20 conversations analyzed, multiple types of relations: 15 cause, 11 enable, 9 overlaps, 8 before and 3 prevent

• 2/3 conversations mention an abstract event entity, e.g. race or remodel

Page 21: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Sample output

Page 22: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Sample output

Page 23: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Sample output

Page 24: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Sample output

Page 25: Grounded Neural Conversational Models · •Natural-sounding conversations about a shared image •Conversation topics are the events and actions that are evoked by the objects in

Thank you

Joint work with: Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Marjan Ghazvininejad, Jiwei Li, Yi Luan, Nasrin Mostafazadeh, Chris Quirk, Alan Ritter, Alessandro Sordoni, Scott Yih