grounded neural conversational models · •natural-sounding conversations about a shared image...

Michel Galley and Lucy Vanderwende

Grounded Neural Conversational Models

Collaborators

Jiwei Li

Stanford

Nasrin Mostafazadeh

U. RochesterMarjan Ghazvininejad

USC/ISI

Alan Ritter

Ohio State U.

Yi Luan

U. WashingtonAlessandro Sordoni

Microsoft

Bill Dolan

Microsoft

Jianfeng Gao

Microsoft

Chris Quirk

Microsoft

Chris Brockett

Microsoft

Scott Yih

Microsoft

Ming-Wei Chang

Microsoft

Goal: Learning to converse

• Seamless and natural

• Open domain

• Open ended and free form(chitchat, informational, …)

Teach machines toengage in conversations

I gotta get out of the house, any recommendations?

Yes it is so sunny! It should stay that way for the rest of the weekend.

Try Mount Rainier. People say it’s beautiful in summer.

The weather is gorgeous today!

Fully Data-Driven Conversation

[Ritter et al., 2011; Sordoni et al., 2015; Vinyals and Le, 2015; Shang et al., 2015; etc.]

Source:

conversation history

Target:

response

Our best model trained with

~140 million conversations

NOT grounded

Dialog Systems: Two paradigms

Understanding

(NLU)State tracker

Generation

(NLG)Dialog policy

input x

output ySta

nd

ard

calendar

Grounded

input x

output y

Fu

lly d

ata

-dri

ven

Environment

A Knowledge-Grounded Neural Conversation Model

ht

Going to

Kusakabe tonight

CONVERSATION HISTORY

Try omakase, the

best in town

RESPONSE

ht DECODERDIALOG

ENCODER

...

WORLD

“FACTS”

A

...CONTEXTUALLY-RELEVANT

“FACTS”

Consistently the best omakase

Amazing sushi tasting […]

They were out of kaisui […]

FACTS

ENCODER

[Ghazvininejad

et al., 2017]

“Infusing” non-conversational knowledge into conversations

You know any good Japanese restaurant in Seattle?

Try Kisaku, one of the best sushi restaurants in the city.

You know any good Arestaurant in B?

Try C, one of the best D in the city.

Sample knowledge-grounded responses

Results w/ 23M conversations: outperforms competitive neural baseline (including on human eval)

Data-driven conversation:toward more informational and “useful” dialogs

Standard dialog systems

(grounded)

chitchat informational,

task-completionFully data-driven

(previously ungrounded)

[Ritter et al., 2011, Sordoni et al., 2015;

Vinyals and Le, 2015; Shang et al., 2015;

Li et al., 2016; …]

[Ghazvininejad

et al., 2017]

GROUNDED!

Grounded and Fully Data-Driven Models

Personalization data

(ID, social graph, ...)

Device sensors

(GPS, vision, ...)

[Li et al., 2016]

[Ghazvininejad et al., 2017]

[Luan et al., 2017]

[Mostafazadeh et al., 2017]

Conversation

Question Generation

• Generating Natural Questions About an Image, ACL 2016

• Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiadong He, Lucy Vanderwende

Image Grounded Conversation

Did he end up winning the race?

Yes he won, he can’t believe it.

Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation

Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, Lucy Vanderwende – arXiv

https://arxiv.org/pdf/1701.08251.pdf

https://arxiv.org/pdf/1701.08251.pdf

• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response

• IGC crowd dataset• 4,222 conversations (avg 4 turns)

• 5 additional questions and first responses per conversationfor evaluation

• Sourced using CrowdChip

Image Grounded Conversation Datasets

• Natural-sounding conversations about a shared image

• Conversation topics are the events and actions that are evoked by the objects in the image

• Both Image and Textual context are informative when generating the question

Key characteristics of IGC dialogue

• 32% of questions are linked to the image frame

• 47% of questions are linked to the textual context frame

• 14% of cases are the image and the textual context frame the same

FrameNet analysis of dialogue

• Natural-sounding conversations about a shared image

• Conversation topics are the events and actions that are evoked by the objects in the image

• Both Image and Text-context are informative when generating the question

• Complex temporal and causal relations are observed across multiple turns in the conversation, as one would expect from natural conversation

Key characteristics of dialogue

Temporal and causal relations across turns

• Of 20 conversations analyzed, multiple types of relations: 15 cause, 11 enable, 9 overlaps, 8 before and 3 prevent

• 2/3 conversations mention an abstract event entity, e.g. race or remodel

Sample output

Thank you

Joint work with: Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Marjan Ghazvininejad, Jiwei Li, Yi Luan, Nasrin Mostafazadeh, Chris Quirk, Alan Ritter, Alessandro Sordoni, Scott Yih

grounded neural conversational models · •natural-sounding conversations about a shared image...

Documents