grounded neural conversational models · •natural-sounding conversations about a shared image...
TRANSCRIPT
Michel Galley and Lucy Vanderwende
Grounded Neural Conversational Models
Collaborators
Jiwei Li
Stanford
Nasrin Mostafazadeh
U. RochesterMarjan Ghazvininejad
USC/ISI
Alan Ritter
Ohio State U.
Yi Luan
U. WashingtonAlessandro Sordoni
Microsoft
Bill Dolan
Microsoft
Jianfeng Gao
Microsoft
Chris Quirk
Microsoft
Chris Brockett
Microsoft
Scott Yih
Microsoft
Ming-Wei Chang
Microsoft
Goal: Learning to converse
• Seamless and natural
• Open domain
• Open ended and free form(chitchat, informational, …)
Teach machines toengage in conversations
I gotta get out of the house, any recommendations?
Yes it is so sunny! It should stay that way for the rest of the weekend.
Try Mount Rainier. People say it’s beautiful in summer.
The weather is gorgeous today!
Fully Data-Driven Conversation
[Ritter et al., 2011; Sordoni et al., 2015; Vinyals and Le, 2015; Shang et al., 2015; etc.]
Source:
conversation history
Target:
response
Our best model trained with
~140 million conversations
NOT grounded
Dialog Systems: Two paradigms
Understanding
(NLU)State tracker
Generation
(NLG)Dialog policy
input x
output ySta
nd
ard
calendar
Grounded
input x
output y
Fu
lly d
ata
-dri
ven
Environment
A Knowledge-Grounded Neural Conversation Model
ht
Going to
Kusakabe tonight
CONVERSATION HISTORY
Try omakase, the
best in town
RESPONSE
ht DECODERDIALOG
ENCODER
...
WORLD
“FACTS”
A
...CONTEXTUALLY-RELEVANT
“FACTS”
Consistently the best omakase
Amazing sushi tasting […]
They were out of kaisui […]
FACTS
ENCODER
[Ghazvininejad
et al., 2017]
“Infusing” non-conversational knowledge into conversations
You know any good Japanese restaurant in Seattle?
Try Kisaku, one of the best sushi restaurants in the city.
You know any good Arestaurant in B?
Try C, one of the best D in the city.
Sample knowledge-grounded responses
Results w/ 23M conversations: outperforms competitive neural baseline (including on human eval)
Data-driven conversation:toward more informational and “useful” dialogs
Standard dialog systems
(grounded)
chitchat informational,
task-completionFully data-driven
(previously ungrounded)
[Ritter et al., 2011, Sordoni et al., 2015;
Vinyals and Le, 2015; Shang et al., 2015;
Li et al., 2016; …]
[Ghazvininejad
et al., 2017]
GROUNDED!
Grounded and Fully Data-Driven Models
Personalization data
(ID, social graph, ...)
Device sensors
(GPS, vision, ...)
[Li et al., 2016]
[Ghazvininejad et al., 2017]
[Luan et al., 2017]
[Mostafazadeh et al., 2017]
Conversation
Question Generation
• Generating Natural Questions About an Image, ACL 2016
• Nasrin Mostafazadeh, Ishan Misra, Jacob Devlin, Margaret Mitchell, Xiadong He, Lucy Vanderwende
Image Grounded Conversation
Did he end up winning the race?
Yes he won, he can’t believe it.
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
Nasrin Mostafazadeh, Chris Brockett, Bill Dolan, Michel Galley, Jianfeng Gao, Georgios P. Spithourakis, Lucy Vanderwende – arXiv
• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response
• IGC crowd dataset• 4,222 conversations (avg 4 turns)
• 5 additional questions and first responses per conversationfor evaluation
• Sourced using CrowdChip
Image Grounded Conversation Datasets
• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response
• IGC crowd dataset• 4,222 conversations (avg 4 turns)
• 5 additional questions and first responses per conversationfor evaluation
• Sourced using CrowdChip
Image Grounded Conversation Datasets
• Twitter Dataset• 250K conversations (3 turn)• Photo + tweet; question; response
• IGC crowd dataset• 4,222 conversations (avg 4 turns)
• 5 additional questions and first responses per conversationfor evaluation
• Sourced using CrowdChip
Image Grounded Conversation Datasets
• Natural-sounding conversations about a shared image
• Conversation topics are the events and actions that are evoked by the objects in the image
• Both Image and Textual context are informative when generating the question
Key characteristics of IGC dialogue
• 32% of questions are linked to the image frame
• 47% of questions are linked to the textual context frame
• 14% of cases are the image and the textual context frame the same
FrameNet analysis of dialogue
• Natural-sounding conversations about a shared image
• Conversation topics are the events and actions that are evoked by the objects in the image
• Both Image and Text-context are informative when generating the question
• Complex temporal and causal relations are observed across multiple turns in the conversation, as one would expect from natural conversation
Key characteristics of dialogue
Temporal and causal relations across turns
• Of 20 conversations analyzed, multiple types of relations: 15 cause, 11 enable, 9 overlaps, 8 before and 3 prevent
• 2/3 conversations mention an abstract event entity, e.g. race or remodel
Sample output
Sample output
Sample output
Sample output
Thank you
Joint work with: Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Marjan Ghazvininejad, Jiwei Li, Yi Luan, Nasrin Mostafazadeh, Chris Quirk, Alan Ritter, Alessandro Sordoni, Scott Yih