knowledge-driven question answering and semantic...
TRANSCRIPT
Knowledge-Driven Question Answering and Semantic Parsing
Duyu Tang
Natural Language Computing Group
Microsoft Research Asia
Joint work with Nan Duan, Daya Guo, Yibo Sun, Shangwen Lv, Jingjing Xu, Ming Zhou
1
Pretrain Model-based Paradigm in NLP
2
input
Pre-trained Model(BERT/XLNet/RoBERTa)
output
Text Classification Question Answering Generation
text
Pre-trained Model(BERT/XLNet/RoBERTa)
category
question
Pre-trained Model(BERT/XLNet/RoBERTa)
answer
sequence
Pre-trained Model(BERT/XLNet/RoBERTa)
sequence
Language Models as Knowledge Bases?
• LAMA (LAnguage Model Analysis)
• Factual Knowledge
• Commonsense Knowledge
• Question Answering
3Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander Miller. Language Models as Knowledge Bases? EMNLP, 2019.
Factual Knowledge in LAMA
• Datasets• Google-RE, covers “place of birth”, “date of birth” and “place of death” relations
• T-Rex, consider 41 Wikidata relations and subsample at most 1000 facts per relation
4Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander Miller. Language Models as Knowledge Bases? EMNLP, 2019.
Commonsense Knowledge in LAMA
• Dataset• ConceptNet, consider facts from the English part of ConceptNet that have single-
token objects covering 16 relations
5Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander Miller. Language Models as Knowledge Bases? EMNLP, 2019.
Question Answering in LAMA
• Dataset• SQuAD, select a subset of 305 context-insensitive questions from the SQuAD
development set with single token answers
• manually create cloze-style questions from questions
6
Who developed the theory of relativity?
The theory of relativity was developed by ______
Original Question
Cloze-Style Question
Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander Miller. Language Models as Knowledge Bases? EMNLP, 2019.
Results on LAMA
• Pretrained case-sensitive LMs• Fs: fairseq-fconv
• Txl: Transformer-XL large
• Eb: ELMo original
• E5B: ELMo 5.5B
• Bb: BERT-base
• Bl: BERT-large
• Baselines• Freq: ranks words based on how frequently they appear as objects for the given relation in the test data
• DrQA: Retrieve Wiki documents with TF/IDF, then extracts answers from top k articles with neural MRC
• Ren: extracts relation triples from sentences known to express the test facts with a pretrained Relation Extraction (RE) model of Sorokin and Gurevych (2017). Entity linking with exact string matching.
• Reo: uses an oracle for entity linking in addition to string matching
7Fabio Petroni, Tim Rocktäschel, Sebastian Riedel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander Miller. Language Models as Knowledge Bases? EMNLP, 2019.
BERT is Not a Knowledge Base (Yet)
• LAMA-UHN (UnHelpful Names)• Filter 1 (string match filter): deletes all KB triples where the correct
answer (e.g., Apple) is a case-insensitive substring of the subject entity (e.g. Apple Watch)
• Filter 2 (person name filter): If the correct answer is among the top-3 for either query, delete the triple.
• Depending on the relation, replace “language” with “city”/“country” in the template.
8
deletes up to 81% of triples from individual relations
[X] is a common name in the following language: [MASK].for both [X] = Jean and [X] = Marais.
(Jean Marais, native-language, French)
Nina Poerner, Ulli Waltinger, Hinrich Schutze. BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA? Arxiv-1911.03681, 2019.
Results with Sequentially Applied Filters
• BERT, ERNIE
• E-BERT• Use Wikipedia2vec, which embeds words and
wikipedia pages in a common space 𝐹.
• Minimizing the squared distance of transformed wikipedia2vec word vectors and BERT subword vectors
• entity is embedded by 𝑊 ∘ 𝐹, while other tokens continue to be embedded by 𝐸𝐵.
• AVG• ensemble BERT and E-BERT by mean- pooling their
outputs
• CONCAT• ensemble BERT and E-BERT by concatenating the
entity and its name with a slash symbol
9Nina Poerner, Ulli Waltinger, Hinrich Schutze. BERT is Not a Knowledge Base (Yet): Factual Knowledge vs. Name-Based Reasoning in Unsupervised QA? Arxiv-1911.03681, 2019.
Negated LAMA: Birds cannot fly
• Created negated versions of Google-RE, TREx and SQuAD by manually inserting a negation element in each template or statement.
10Nora Kassner, Hinrich Schutze. Negated LAMA: Birds cannot fly. Arxiv-1911.03343, 2019.
Who developed the theory of relativity?
The theory of relativity was developed by ______
Original Question
Cloze-Style Question
The theory of relativity was not developed by ______Negated Cloze-Style Question
Negated LAMA: Birds cannot fly
• measures are spearman rank correlation and overlap in rank 1 predictions between the original and negated dataset
11Nora Kassner, Hinrich Schutze. Negated LAMA: Birds cannot fly. Arxiv-1911.03343, 2019.
BERT mostly did not learn the meaning of negation
Negated LAMA: Birds cannot fly
12Nora Kassner, Hinrich Schutze. Negated LAMA: Birds cannot fly. Arxiv-1911.03343, 2019.
Agenda
• External Evidence Knowledge
• Grammar Knowledge
• Conversational Context Knowledge
• Data Knowledge
13
SQuAD
14Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang. "SQuAD:100,000+ Questions for Machine Comprehension of Text." EMNLP-2016.
A passage
One QA pair
DrQA
15Danqi Chen, Adam Fisch, Jason Weston, Antoine Bordes. "Reading Wikipedia to Answer Open-Domain Questions." ACL-2017.
HotpotQA
• Multi-hop Reasoning across Multiple Documents
• Evaluation settings• Distractor: 2 gold paragraphs + 8 from information retrieval (fixed for all models)
• Fullwiki: Entire Wikipedia as context
16Zhilin Yang*, Peng Qi*, Saizheng Zhang*, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning. "HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering." EMNLP-2018.
CogQA (An Iterative Framework)
• Cognitive graph for multi-hop QA.
17Ming Ding, Chang Zhou, Chang Zhou, Qibin Chen, Hongxia Yang, and Jie Tang. Cognitive graph for multi-hop reading comprehension at scale. In ACL-2019.
CogQA (An Iterative Framework)
18
• System 1• Extract question-relevant entities and
answer candidates from paragraphs to build the cognitive graph
• Generate semantic vector for each node
• System 2• Compute hidden vector on graph
• Compute clues to guide System 1 to extract next-hop entities
Clues: raw sentences of x’s predecessor nodes that mention x
Iterative Retrieve-and-Read
• For many multi-hop questions, not all the relevant context can be obtained in a single retrieval step
19Qi Peng, Xiaowen Lin, Leo Mehr, Zijian Wang, Christopher Manning. "Answering Complex Open-domain Questions Through Iterative Query Generation." EMNLP-2019.
Iterative Retrieve-and-Read
• At each step the model also uses IR results from previous hops of reasoning to generate a new natural language query and retrieve new evidence to answer the original question
20Qi Peng, Xiaowen Lin, Leo Mehr, Zijian Wang, Christopher Manning. "Answering Complex Open-domain Questions Through Iterative Query Generation." EMNLP-2019.
Query Generation as QA
• Main Idea• For each reasoning step, generate the search query given the original question q
and some context of documents already retrieved (initially empty).• The target here is a search query that helps retrieve the desired supporting document for the
next reasoning step.
• Use DrQA to extracts text spans from the context over one that generates free-form text as search queries
• Fix the number of retrieval steps as 2
• Set the number of retrieved documents added to the retrieval context to 5 for each retrieval step
• Index English Wikipedia dump with introductory paragraphs
21
Learning to Retrieve Reasoning Paths
• Graph-based recurrent retrieval approach that learns to retrieve reasoning paths over the Wikipedia graph to answer multi-hop open-domain questions
22Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong. "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering." arXiv-1911.10470.
Graph Construction
• Each node of the Wikipedia graph represents a single paragraph
• Use Wikipedia hyperlinks to construct the direct edges.
• Also consider symmetric within-document links, allowing a paragraph to hop to other paragraphs in the same article.
• This graph is constructed offline and is reused throughout training and inference for any question.
23Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong. "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering." arXiv-1911.10470.
Approach Overview
24Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong. "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering." arXiv-1911.10470.
Retriever Model
• Sequentially retrieves each evidence document, given the history of previously retrieved documents to form several reasoning paths in a graph of entities.
• use a Recurrent Neural Network (RNN) to model the reasoning paths for the question 𝑞. At the 𝑡-th time step (t ≥ 1) our model selects a paragraph 𝑝𝑖 among candidate paragraphs 𝐶𝑡 given the current hidden state ℎ𝑡 of the RNN.
• The next candidate set 𝐶𝑡+1 is constructed to include paragraphs that are linked from the selected paragraph 𝑝𝑖 in the graph.
25Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong. "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering." arXiv-1911.10470.
Reader Model
• Model the reader as a multi-task learning of1) reading comprehension, that extracts an answer span from a reasoning path E
using a standard approach
2) reasoning path re-ranking, that re-ranks the retrieved reasoning paths by computing the probability that the path includes the answer
• Share the same model for reading comprehension and re-ranking
• At the inference time, select the best evidence 𝐸𝑏𝑒𝑠𝑡 by 𝑃(𝐸|𝑞), and output the answer span by 𝑆𝑟𝑒𝑎𝑑 in 𝐸𝑏𝑒𝑠𝑡
26Akari Asai, Kazuma Hashimoto, Hannaneh Hajishirzi, Richard Socher, Caiming Xiong. "Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering." arXiv-1911.10470.
CommonsenseQA
27Alon Talmor, Jonathan Herzig, Nicholas Lourie, Jonathan Berant. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge. NAACL-2019
Commonsense Auto-Generated Explanation (CAGE)
28Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong and Richard Socher. Explain Yourself! Leveraging Language Models for Commonsense Reasoning. ACL-2019
Common Sense Explanations (CoS-E)
29
CAGE for CommonsenseQA
30Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong and Richard Socher. Explain Yourself! Leveraging Language Models for Commonsense Reasoning. ACL-2019
Q 𝐴1 𝐴2 𝐴3 𝐸1 𝐸𝑖−1…
GPT
𝐸𝑖 Q 𝐴1 𝐴2 𝐴3 𝐸1 …
GPT
𝐸𝑖
BERT 𝐴1
Use the LM explanations to make a predictionLanguage Model explanation generation
KagNet
• Schema Graph Construction• Tokenization / Lemmatization
• Match ConceptNet Vocab
• Find paths between QA-concept pair
• Path pruning by length (≤5 nodes) and embedding-based metric
31Bill Yuchen Lin, Xinyue Chen, Jamin Chen and Xiang Ren. KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning. EMNLP-2019
KagNet
32Bill Yuchen Lin, Xinyue Chen, Jamin Chen and Xiang Ren. KagNet: Knowledge-Aware Graph Networks for Commonsense Reasoning. EMNLP-2019
Retrieve Knowledge From ConceptNet and Wikipedia
33
Question: What do people typically do while playing guitar ?
A. cry B. hear sounds C. singing (√) D. anthritis E. making music
people
eyes
cry
sound
singing
playing guitarHasA
IsA
Requires
RelatedTo
Evidence from ConceptNet
Evidence from Wikipedia
• “Don’t Cry for me” features Greek mandolin with heavy metal guitar.
• What is to cry and to weep?
• She also performed them, playing guitar and singing.
• Jakszyk led the band, playing guitar and singing.
A. cry
C. singing
• He began making music when he started guitar lessons
• I like making music and playing guitar with other peopleE. making music
Shangwen Lv, Daya Guo, Jingjing Xu, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Songlin Hu
. "Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering." AAAI-2020, full paper.
XLNet-based Baseline
34
XLNet
question [SEP] choice 1
…….XLNet
question [SEP] choice 2
XLNet
question [SEP] choice 5
Softmax
Distribution over answer candidates
XLNet + Graph Reasoning
35
Knowledge Retrieval +
Graph Constructionquestion [SEP] choice
XLNet with Graph distance
Word vectorsGraph-based
Representation Learning
QC vector
Graph vectorsGraph-based
Reasoning
Confidence score
Shangwen Lv, Daya Guo, Jingjing Xu, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Songlin Hu
. "Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering." AAAI-2020, full paper.
CommonsenseQA Leaderboard
36
Model Analysis
37Shangwen Lv, Daya Guo, Jingjing Xu, Duyu Tang, Nan Duan, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Songlin Hu
. "Graph-Based Reasoning over Heterogeneous External Knowledge for Commonsense Question Answering." AAAI-2020, full paper.
Agenda
• External Evidence Knowledge
• Grammar Knowledge
• Conversational Context Knowledge
• Data Knowledge
38
Knowledge-based QA (KBQA)
• Answer natural language questions based on given knowledge bases
39
Where was Obama born?
𝜆𝑥. 𝑝𝑒𝑜𝑝𝑙𝑒_𝑝𝑒𝑟𝑠𝑜𝑛_𝑑𝑎𝑡𝑎𝑂𝑓𝐵𝑖𝑟𝑡ℎ(𝑂𝑏𝑎𝑚𝑎, 𝑥)
Honolulu
What are the names of his daughters
𝜆𝑥. 𝑝𝑎𝑟𝑒𝑛𝑡(𝑂𝑏𝑎𝑚𝑎, 𝑥) ∧ 𝑔𝑒𝑛𝑑𝑒𝑟(𝑥, 𝐹𝑒𝑚𝑎𝑙𝑒)
Natasha Obama, Malia Ann Obama
Table-based QA (TBQA)
40
Year City Country Nations
1896 Athens Greece 14
1900 Pairs France 24
… … … …
2004 Athens Greece 201
2008 Beijing China 204
2012 London UK 204
SELECT Character WHERE Year = 2008
SELECT Nations WHERE Year = 2008
SELECT Nations WHERE Year = 2004
Image-based QA
41
Semantic Parsing
• Map NL questions into machine executable logical forms based on a knowledge graph/web table
42
……
Question How many CFL teams are from York College?
SQL 𝑺𝑬𝑳𝑬𝑪𝑻 𝐶𝑂𝑈𝑁𝑇 𝐶𝐹𝐿 𝑇𝑒𝑎𝑚 𝑾𝑯𝑬𝑹𝑬 𝐶𝑜𝑙𝑙𝑒𝑔𝑒 = "𝑌𝑜𝑟𝑘"
2
CFL Team College
Hamilton Tiger-Cats Wilfrid Laurier
Calgary Stampeders York
Toronto Argonauts York
Semantic Parsing
Execution
Answer
Table-Based Knowledge Graph-Based
Question Where was Donald Trump given birth?
LF 𝝀𝒙. 𝑝𝑒𝑜𝑝𝑙𝑒. 𝑝𝑒𝑟𝑠𝑜𝑛. 𝑝𝑙𝑎𝑐𝑒_𝑜𝑓_𝑏𝑖𝑟𝑡ℎ(𝐷𝑜𝑛𝑎𝑙𝑑 𝑇𝑟𝑢𝑚𝑝, 𝒙)
Queens, New York City
Semantic Parsing
Execution
Answer
Donald Trump
United StatesQueens, New York City
Table-Based Semantic Parsing
• WikiSQL Dataset
• 87,726 human annotated question-SQL pairs distributed across 26,375 tables from Wikipedia
43Zhong, Victor, Caiming Xiong, and Richard Socher. "Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning." arXiv:1709.00103 (2017).
# of <Q, SQL, T, A> tuples
Train set 61,297
Dev set 9,145
Test set 17,284
Question:
Table:
𝑆𝐸𝐿𝐸𝐶𝑇 𝐶𝑂𝑈𝑁𝑇 𝐶𝐹𝐿 𝑇𝑒𝑎𝑚 𝑊𝐻𝐸𝑅𝐸 𝐶𝑜𝑙𝑙𝑒𝑔𝑒 = “𝑌𝑜𝑟𝑘”SQL:
2Answer:
Pick # CFL Team Player Position College
27 Hamilton Tiger-Cats Connor Healy DB Wilfrid Laurier
28 Calgary Stampeders Anthony Forgone OL York
29 Toronto Argonauts Frank Hoffman DL York
Two evaluation metrics
• SQL Accuracy
• Execution Accuracy
44
Use SQL sketch
MethodExecution Accuracy
Comments
N
Seq2Seq (Dong 2016) 35.9% Sequence-to-Sequence
Seq2SQL (Zhong 2017) 59.4% Seq2Seq + PointNet + SELECT column and agg
Wang 2017 66.8% Seq2Seq + type decoder
Huang 2018 68.0% Seq2Seq + type decoder + meta-learning
MAPO (Liang 2018) 72.6% Denotation + fine-grained actions + improved RL
Our End2End approach (Sun 2018) 74.4% MSRA NLC @ACL-2018
MQAN (McCann et al. 2018) 81.4% Natural Language Decathlon (multi-task)
Y
SQLNet (Xu 2017) 68.0% Predict WHERE column, then op and value
Guo 2018 69.0% SQLNet + charemb + bi-attention
Our On-going 72.8% word-token dictionary + iterative back-translation
Coarse2Fine (Dong 2018) 78.5% First decode SQL sketch, then tokens
TypeSQL (Yu 2018) 82.6% Predict fine-grained input types w/ rule + Freebase
IncSQL (Shi 2018) 83.7% Seq2Action + execution-oriented column modeling
Coarse2Fine + EG Decoding (Wang 2018) 83.8% Use partially generated output to guide the decoding
Our SF approach 85.5% MSRA NLC slot-filling based model
IncSQL + EG Decoding (Shi 2018) 87.1% IncSQL + execution-guided decoding
Seq-to-Seq with Pointer Network
45
question columnnames
SQL language
Column NamesQuestion SQL
𝑆𝐸𝐿𝐸𝐶𝑇, 𝑊𝐻𝐸𝑅𝐸, 𝐶𝑂𝑈𝑁𝑇,𝑀𝐼𝑁, 𝑀𝐴𝑋, 𝐴𝑁𝐷, >, <, =.
Encoder
Decoder
Pick # CFL Team Player Position CollegePick # CFL Team Player Position College
Attention
<𝑆> 𝑆𝐸𝐿𝐸𝐶𝑇 𝐶𝑂𝑈𝑁𝑇 𝐶𝐹𝐿 𝑇𝑒𝑎𝑚 𝑊𝐻𝐸𝑅𝐸 𝐶𝑜𝑙𝑙𝑒𝑔𝑒 = “𝑌𝑜𝑟𝑘”
Seq-to-Seq with Structural Decodeing
46
TableQuestion SQL
𝑆𝐸𝐿𝐸𝐶𝑇, 𝑊𝐻𝐸𝑅𝐸, 𝐶𝑂𝑈𝑁𝑇,𝑀𝐼𝑁, 𝑀𝐴𝑋, 𝐴𝑁𝐷, >, <, =.
Pick # CFL Team Player Position College
27 Hamilton Tiger-Cats Connor Healy DB Wilfrid Laurier
28 Calgary Stampeders Anthony Forgone OL York
29 Toronto Argonauts Frank Hoffman DL York
Encoder
Decoder
<𝑆>
SQL
value
column𝑆𝐸𝐿𝐸𝐶𝑇𝑊𝐻𝐸𝑅𝐸
𝑀𝐼𝑁
𝐶𝑂𝑈𝑁𝑇
𝑀𝐴𝑋
𝐴𝑁𝐷>
<
=𝒕 = 𝟎
𝑆𝐸𝐿𝐸𝐶𝑇 𝐶𝑂𝑈𝑁𝑇 𝐶𝐹𝐿 𝑇𝑒𝑎𝑚 𝑊𝐻𝐸𝑅𝐸 𝐶𝑜𝑙𝑙𝑒𝑔𝑒 = “𝑌𝑜𝑟𝑘”column valueSQL columnSQL SQL SQL
SQL SQL
value
columncolumn
value
SQL
York
Wilfrid Laurier
York
𝒕 = 𝟐 𝒕 = 𝟔
Yibo Sun, Duyu Tang, Nan Duan, Jianshu Ji, Guihong Cao, Xiaocheng Feng, Bing Qin, Ting Liu and Ming Zhou. "Semantic Parsing with Syntax- and Table-Aware SQL Generation." ACL-2018 full paper.
47
Episode # Country City Martial Art/Style Masters Original Airdate
1.1 China Dengfeng Kung Fu ( Wushu ; Sanda ) Shi De Yang, Shi De Cheng 28-Dec-07
1.2 Philippines Manila KaliLeo T. Gaje Jr. Cristino
Vasquez4-Jan-08
1.3 Japan Tokyo Kyokushin Karate Yuzo Goda, Isamu Fukuda 11-Jan-08
1.4 MexicoMexico
CityBoxing
Ignacio "Nacho" BeristáinTiburcio Garcia
18-Jan-08
1.5 Indonesia Bandung Pencak SilatRita Suwanda Dadang
Gunawan25-Jan-08
1.7South Korea
Seoul HapkidoKim Nam Je, Bae Sung Book
Ju Soong Weo8-Feb-08
1.8 BrazilRio de Janeiro
Brazilian Jiu-JitsuBreno Sivak, Renato Barreto
Royler Gracie15-Feb-08
1.9 Israel Netanya Krav MagaRan Nakash Avivit Oftek
Cohen22-Feb-08
how many masters fought using a boxing style ?Question #1:
select count masters from table where style = boxingAug.PntNet:
STAMP: select count masters from table where martial art/style = boxing
when did the episode featuring a master using brazilian jiu-jitsu air ?Question #2:
select original airdate from table where masters = brazilian jiu-jitsuAug.PntNet:
STAMP: select original airdate from table where martial art/style = brazilian jiu-jitsu
Slot-Filling Approach
48
Question
Step 1. $where-value
SELECTWHERE
$select-aggregator $select-column
$where-column1 $where-operator1 York
SELECT $select-aggregator $select-column WHERE $where-column1 $where-operator1 $where-value1
… … … … AND $where-columnn $where-operatorn $where-valuen
SQL sketch
Step 2. $where-column/operator
SELECTWHERE
$select-aggregator $select-column
College = York
Step 3. $select-column/aggregator
SELECTWHERE
COUNT 𝐶𝐹𝐿 𝑇𝑒𝑎𝑚
College = York
preceding context following context
> < =
column prediction operator prediction
Yibo Sun, Duyu Tang, Nan Duan, Yeyun Gong, Xiaocheng Feng, Bing Qin, Daxin Jiang. Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning. AAAI-2020
Table-Based Semantic Parsing
49
𝑠0
𝑠1 𝑠2
𝑠3
Cond on Home World
𝑠1′
𝑠1′′
𝑠2′
𝑠2′′
𝑄 = “Which super heroes came from Earth?”, 𝐴∗ = {Dragonwing, Harmonia}
➢ Which super heroes came from Earth and first appeared after 2009?
SELECT Character
WHERE{Home World = Earth} ∧{First Appeared > 2009}
Mohit Iyyer, Wen-tau Yih, Ming-Wei Chang. Search-based Neural Structured Learning for Sequential Question Answering. ACL-2017.
Action and Module
• The goodness of a state: V 𝑠𝑡 = 𝑉 𝑠𝑡−1 + 𝜋 𝑠𝑡−1, 𝑎𝑡 , 𝑉 𝑠0 = 0
• Value of 𝜋 𝑠, 𝑎 is determined by a neural-network model
• Actions of the same type (e.g., select-column) share the same neural-network module
50
𝑠0 𝑠2 𝑠3𝑠1 Cond on Home World Value = Earth
𝑎1 𝑎2 𝑎3
➢Which super heroes came from Earth? ,
𝜋(𝑠0, 𝑎1) 𝜋(𝑠1, 𝑎2) 𝜋(𝑠2, 𝑎3)
Mohit Iyyer, Wen-tau Yih, Ming-Wei Chang. Search-based Neural Structured Learning for Sequential Question Answering. ACL-2017.
Table-Based Semantic Parsing
51
Which city hosted Summer
Olympic in 2008?
Question
Controller A1 (arg=col) A2 (arg=col)
Column Prediction
A3 (arg=op) A4 (arg=value)
Operator Prediction
Value Prediction
sketch
……
Action Operation
A1 SELECT
A2 WHERE-Col
A3 WHERE-Op
A4 WHERE-Val
𝑆0
A1 A2 A3 A4end states
start state𝑆0
𝑆0 → 𝐴1 → 𝐴2 → 𝐴3 → 𝐴4
𝑆0 → 𝐴1
SELECT+WHERE
SELECT
Yibo Sun, Duyu Tang, Nan Duan, Jingjing Xu, Xiaocheng Feng, Bing Qin. "Knowledge-Aware Conversational Semantic Parsing Over Web Tables." NLPCC, 2019
Coarse-to-Fine Decoding
52Li Dong, Mirella Lapata. Coarse-to-Fine Decoding for Neural Semantic Parsing. ACL-2018.
KBQA with Semantic Parsing (single-turn)
53
Where was the president of the United States born?
S
set
A1
find(set, r1)
A4
find(set, r2)
A4
{e}
United States
A15
A16
placeOfBirth
A17
isPresidentOf
A17
A1: 𝑆 → 𝑠𝑒𝑡
A4: 𝑠𝑒𝑡 → 𝑓𝑖𝑛𝑑(𝑠𝑒𝑡, 𝑟1)
A4: 𝑠𝑒𝑡 → 𝑓𝑖𝑛𝑑(𝑠𝑒𝑡, 𝑟2)
A15: 𝑠𝑒𝑡 → {𝑒}
A16: 𝑒 → 𝑈𝑛𝑖𝑡𝑒𝑑 𝑆𝑡𝑎𝑡𝑒𝑠
A17: 𝑟2 → 𝑖𝑠𝑃𝑟𝑒𝑠𝑖𝑑𝑒𝑛𝑡𝑂𝑓
A17: 𝑟1 → 𝑝𝑙𝑎𝑐𝑒𝑂𝑓𝐵𝑖𝑟𝑡ℎ
𝑆
𝐴1
𝐴1
𝐴4
𝐴4
𝐴4
𝐴4
𝐴15
𝐴15
𝑒𝑈𝑆
𝑟𝑔𝑟𝑎𝑑
𝑒𝑛𝑑
𝑒𝑈𝑆
𝑟𝑝𝑟𝑒𝑠
𝑟𝑝𝑟𝑒𝑠
𝑟𝑔𝑟𝑎𝑑
Daya Guo, Duyu Tang, Nan Duan, Ming Zhou and Jian Yin. "Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base." NeurIPS-2018 full paper.
Image-Based Semantic Parser
54Chenfei Wu, Yanzhao Zhou, Gen Li, Nan Duan, Duyu Tang, Xiaojie Wang. Deep Reason: A Strong Baseline for Real-World Visual Reasoning. Arxiv-1905.10226, 2019.
LF Exact Match
Seq2Seq 77.4%
Seq2Action 85.6%
Semantic Parsing Result on GQA Questions
Agenda
• External Evidence Knowledge
• Grammar Knowledge
• Conversational Context Knowledge
• Data Knowledge
55
Coreference and Ellipsis Phenomena
56
▪ Q1: Where was the president of the United States born?
▪ A1: New York City
▪ Q2: Where did he graduate from?
▪ Q1: Who is the president of the United States?
▪ A1: Donald Trump
▪ Q2: How many children does he have?
Question Subsequent Coreference
Answer Entity Coreference
▪ Q1: Who is the president of the United States?
▪ A1: Donald Trump
▪ Q2: what is its population?
Question Entity Coreference
▪ Q1: What movie did Leonardo DiCaprio won an Oscar for?
▪ A1: The Revenant
▪ Q2: who is the director?
Entity Ellipsis
Predicate Ellipsis
▪ Q1: Who is the president of the United States?
▪ A1: Donald Trump
▪ Q2: and also tell me about China?
KBQA with Semantic Parsing (multi-turn)
57
Dialog Memory
Entity{United States, tag=utterance}
{New York City, tag=answer}
Predicate {isPresidentOf}
{placeOfBirth}
Action
Subsequence
𝑠𝑒𝑡 → 𝐴4 𝐴15 𝑒𝑈𝑆 𝑟𝑝𝑟𝑒𝑠
𝑠𝑒𝑡 → 𝐴4 𝐴15𝑠𝑒𝑡 → 𝐴4 𝐴4 𝐴15 𝑒𝑈𝑆 𝑟𝑝𝑟𝑒𝑠 𝑟𝑏𝑡ℎ
𝑠𝑒𝑡 → 𝐴4 𝐴4 𝐴15
Where was president of
the United States born?New York City
Where did he
graduate from?
𝑟𝑔𝑟𝑎𝑑
𝑟𝑔𝑟𝑎𝑑
𝑒𝑛𝑑
𝐴4 𝐴19 𝐴4 𝐴15
𝐴19
𝑒𝑈𝑆 𝑟𝑝𝑟𝑒𝑠
replicated action sequence w/ instantiation
Previous Question Previous Answer Current Question
S
setA1
find(set, r1)
A4
graduateFrom
A17
find(set, r2)
A4
{e}
United States
A15
A16
isPresidentOf
A17
𝑆
𝐴1
𝐴1
𝐴4
copy
Training Data Collection
• Input: <question, answer> pairs and a Knowledge Graph (e.g. Freebase/Sartori)
• Output: <question, LFs, answer>
58
<question, answer>
Action Sampler
ExecutedOn KB
• Correct answer
• Wrong answer
• Correct answer
Evaluation on CSQA Dataset (IBM Research, 2018)
59
9.95
7.26
9.38
13.67
39.4
41.52
81.98
69.83
62.88
Ellipsis
Coreference
Overall
D2A D2A w/o DM S2S
Daya Guo, Duyu Tang, Nan Duan, Ming Zhou and Jian Yin. "Dialog-to-Action: Conversational Question Answering Over a Large-Scale Knowledge Base." NeurIPS-2018 full paper.
Dialogs 200,000
Turns 1.6M
Entities in KB 12.8M
Unique relations 330
KB Tuples 21.2M
Entity Types 642
CSQA Dataset Statistics
Table-Based Semantic Parser
60
SELECT 𝑆𝐸𝐿𝐸𝐶𝑇 𝐶ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟
WHERE 𝑊𝐻𝐸𝑅𝐸 𝑌𝑒𝑎𝑟 = 2008
SELECT+WHERE 𝑆𝐸𝐿𝐸𝐶𝑇 𝐶ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟 𝑊𝐻𝐸𝑅𝐸 𝑌𝑒𝑎𝑟 = 2008
Which city hosted Summer
Olympic in 2008?How many nations
participate in that year?
Previous Question Current Question
Controller A1 (arg=col) A6 (arg=previousWhere)
Column Prediction
CopyWhere
Year CountryCity Nations
sketch
Action Operation
A1 SELECT
A2 WHERE-Col
A3 WHERE-Op
A4 WHERE-Val
A5 Copy SELECT
A6 Copy WHERE
A7 Copy SELECT+ WHERE
𝑆0
A1
end states
start state𝑆0
A2 A3 A4
A6
A5
A7
A2
A2
Yibo Sun, Duyu Tang, Nan Duan, Jingjing Xu, Xiaocheng Feng, Bing Qin. "Knowledge-Aware Conversational Semantic Parsing Over Web Tables." NLPCC, 2019
Agenda
• External Evidence Knowledge
• Grammar Knowledge
• Conversational Context Knowledge
• Data Knowledge
61
Retrieval-based Semantic Parsing
62
Retrieve
Prediction
Edit
Retrieve-and-Edit
• Retrieve-and-edit
• Task-dependent similarity: • two inputs x and x’ should be considered
similar only if the editor has a high likelihood of editing y’ into y.
63
𝑝𝑚𝑜𝑑𝑒𝑙 𝑦 𝑥 =
𝑥′,𝑦′
𝑝𝑒𝑑𝑖𝑡 𝑦 𝑥, 𝑥′, 𝑦′ 𝑝𝑟𝑒𝑡( 𝑥′, 𝑦′ |𝑥)
ℒ 𝑝𝑒𝑑𝑖𝑡, 𝑝𝑟𝑒𝑡 = 𝐸[𝑙𝑜𝑔𝑝𝑚𝑜𝑑𝑒𝑙 𝑦 𝑥 ]maximize
Tatsunori Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang. . "A retrieve-and-edit framework for predicting structured outputs." NeurIPS-2018,
+
• Considers retrieved datapoints as a pseudo task for fast adaptation
64
Training Dataset D
Meta-Test Datad
Context-AwareRetriever R
Meta-Learner Mθ
Meta-Train DataSd
Learner Mθ’
Construct a task
𝜃′ = 𝜃 − 𝛼∇𝜃ℒ 𝑀𝜃 𝑤. 𝑟. 𝑡. 𝑆𝑑Step 1:
𝜃 ← 𝜃 − 𝛽∇𝜃ℒ 𝑀𝜃′ 𝑤. 𝑟. 𝑡. 𝑑Step 2:
Context-Aware Retrieval Model
65
Natural language utterance
Variables
Methods
double [ ] vecElements
double [ ] weights
void add
float dotProduct
Context environment
implement this vector in place
Encoder
ℎ𝑥
ℎ𝑐
ℎ𝑐𝑚
ℎ𝑐𝑣
Latent Variable
𝐿𝑆𝑇𝑀 𝑣𝑀𝐹(𝑧𝑥|𝜇𝑥, 𝜅)𝜇𝑥
𝜇𝑐
Decoder
𝑣𝑀𝐹(𝑧𝑐|𝜇𝑐 , 𝜅)
𝑧𝑥
𝑧𝑐
<s>
void
void
inc
𝑧 ……
Evaluation: Tasks
66
▪ Q1: Where was the president of the United States born?
▪ A1: New York City
▪ Q2: Where did he graduate from?
Task I: Conversational Question Answering over KB Task II: Code Generation
Evaluation: Results
• Retrieve datapoints for each example to construct a *pseudo* task for fast adaptation
67
Method Exact BLEU
Sequence-to-Sequence 3.20 23.51
Yin+@ACL-2017 6.65 21.29
Iyer+@EMNLP-2018 8.60 22.21
Dialog2Action (MSRA-NLC@NeurIPS2018)
9.15 23.24
Dialog2Action+MAML (ours) 10.50 24.40
Task I: Conversational Question Answering over KB Task II: Code Generation
• New state-of-the-art on both tasks.
MethodSimple
QuestionLogical
ReasoningQuantitative Reasoning
Comparative Reasoning
Sequence-to-Sequence 13.64 8.33 0.96 2.96
Dialog2Action (MSRA-NLC@NeurIPS2018)
92.01 42.00 45.37 41.51
Dialog2Action + MAML (ours) 92.66 44.34 50.30 48.13
Daya Guo, Duyu Tang, Nan Duan, Ming Zhou and Jian Yin. "Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing." ACL-2019 full paper.
More Results on Code Generation
68
Retrieve-and-edit
Retrieve-and-MAML
Model Analysis
69
Daya Guo, Duyu Tang, Nan Duan, Ming Zhou and Jian Yin. "Coupling Retrieval and Meta-Learning for Context-Dependent Semantic Parsing." ACL-2019 full paper.
Retrieved Examples
70
Input Context-Aware Retriever Context-Independent Retriever
Class environment:
Map<Point, RailwayNode> _nodeMap;
NL:
Check if a node at a specific position exits.
Code:
boolean function(Point arg0){
return _nodeMap.constainsKey(arg0);}
Class environment:
Node root;
Node get(Node x, String key, int d);
NL: Does the set contain the given key
Code:
boolean function(String arg0){
Node loc0==get(root,arg0,0);
if (loc0==null) return false;
return loc0.isString; }
Class environment:
HashMap<lalr_item, lalr_item> _all;
NL:
Does the set contain a particular item
Code:
boolean function(lalr_item arg0){
return _all.constainsKey(arg0); }
Q1: who is the dad of jorgen ottesenbrahe?
A1: otte brahe
Q2: who is the spouse of that one?
Q1: whose child are gio batta bellotti?
A1: matteo bellotti, paola cresipi guzzo
Q2: which person is married to that one?
Q1: which abstract beings have marge
simpson as an offspring?
A1: clancy bouvier, jacqueline bouvier
Q2: who is the spouse of that one?