question answering using enhanced lexical semantic modelsbenking/resources/reading_group/... ·...
Post on 10-Aug-2021
10 Views
Preview:
TRANSCRIPT
Question Answering using Enhanced Lexical Semantic Models
Wen-Tau Yih, Ming-Wei Chang, Christopher Meek, and Andrzej Pastuniak
ACL 2013
October 23rd, 2013U.Mich. NLP Reading Group
Presented by V.G.Vinod Vydiswaran (vgvinodv@umich.edu)
Friday, October 25, 13
Who won the best actor Oscar in 1973?
A1: Jack Lemmon won the Academy Award for Best Actor for Save the Tiger (1973)
A2: Oscar winner Kevin Stacey said that Jack Lemmon is remembered as always making time for other people.
Answer selection is a key step to address QAConceptually, a semantic matching problemSemantic structure matching is one approach
Friday, October 25, 13
Latent word-alignment view
Task is to classify a question/sentence pairWords are “aligned” based on similarityMultiple functions can be used for this purpose
What is the fastest car in the world?
The Jaguar XJ220 is the dearest, fastest, and most sought after car on the planet.
Friday, October 25, 13
Lexical Semantic Models
Synonymy and AntonymyPILSA model (Yih et al., 2012)
Hypernymy and HyponymyProbase (Wu et al., 2012)
Semantic word similarityBased on three vector space models
Friday, October 25, 13
1. Synonymy and Antonymy
PILSA: Polarity Inducing Latent Semantic AnalysisSigned d-by-n co-occurrence matrix
d: number of word groups, n: vocabulary size
each element: tf-idf of corresponding word in the group
Antonyms given a negative value
Low-rank approximation derived by singular-value decomposition
Synonym/Antonym: Cosine score between column vectorsLearnt over Encarta thesaurus + discriminative projection matrix training method
Friday, October 25, 13
2. Hypernymy and Hyponymy
Limitations of WordNetProbase: Automatically extracted connections between 2.7M concepts by applying Hearst patterns to 1.68B webpages
Probabilistic value for each relation based on co-occurrence
What color is Saturn?Saturn is a giant gas planet with brown and beige clouds.
Who wrote Moonight Sonata?Ludwig van Beethoven composed the Moonlight Sonata in 1801.
Friday, October 25, 13
3. Semantic word similarity
Vector space models based on distributional similarityThree vector space models
Wikipedia contexts (Yih and Qazvinian, 2012)
Recurrent neural network language models (RNNLM) (Mikolov et al., 2012)
640-dim RNNLM vectors trained on Broadcast news corpus
Concept projection over click-through data (Gao et al., 2011)
Friday, October 25, 13
Bag of words model:
Learning latent structures (LCLR; Chang et al., 2010):
Matching models
�avgj (q, s) =1
mn
X
wq2Vq
ws2Vs
�j(wq, ws)
�
maxj (q, s) = max
wq2Vq
ws2Vs
�j
(wq
, ws
)
argmax
h✓
T�(x, h)
✓
T�(x) min
✓
1
2
k✓k2 + C
X
i
⇠
2i
s.t. ⇠i � 1� yi max
h✓
T�(x, h)
x = (q, s);Vq = {wq1 , wq2 , . . . , wqm};Vs = {ws1 , ws2 , . . . , wsm}
Friday, October 25, 13
Evaluation Setup
Derived from TREC-QA by Wang et al. (2007)~33 candidate sentences per questionTraining: 5,919 question/sentence pairs from TREC 8-12, manually labeledDev / Test: 1,374 / 1,866 pairs from TREC-13 Candidate sentences over 40 words removedEvaluation measure: MAP and MRR
Friday, October 25, 13
Baselines
Random: Give random score to each candidate sentenceWord Count: Word overlap excluding stopwordsWeighted Word Count: Words weighted with idf of question word3 existing methods, primary on tree structure:
Syntax-driven dependency-tree matching (Wang et al., 2007)
Quasi-synchronous grammar with Tree-edit CRF model (Wang and Manning, 2010)Tree-kernel function between dependency trees (Hielman and Smith, 2010)
Friday, October 25, 13
Performance of baseline systems
Best MAP 0.61, Best MRR 0.70
System MAP MRRWang et al. (2007)Wang and Manning (2010)Heilman and Smith (2010)
0.603 0.685
0.595 0.695
0.609 0.692
Friday, October 25, 13
Simple baseline results
Baseline systems: MAP 0.609, MRR 0.695
Baseline
DevDev TestTest
MAP MRR MAP MRR
Random
Word count
Weighted Word count
0.524 0.582 0.471 0.529
0.652 0.722 0.626 0.682
0.711 0.788 0.653 0.707
Friday, October 25, 13
Adding lexical semantics
I: Identical word matching (+weights)L: Lemma matching (+weights)WN: WordNet synonyms, antonyms, hyper/hyponyms (+weights)LS: Enhanced lexical semantics (+weights)NE: If word part of comparable named entity string (+weights)QW: If question word and named entity compatible
Friday, October 25, 13
Models evaluated
Unstructured, bag-of-words settingLogistic regression (LR)
Boosted decision trees (BDT)
Structured output settingLCLR with all question words covered
Friday, October 25, 13
Lexical semantic features help
+8 to +12% from I to All+11% MAP, +12% MRR of LCLR with All over baseline (I with LR)+25.6% rel MAP, +18.8% rel MRR over published baseline systems
Feature setLRLR BDTBDT LCLRLCLR
MAP MRR MAP MRR MAP MRRII+LI+L+WNI+L+WN+LSAll
0.653 0.707 0.632 0.69 0.663 0.7280.674 0.722 0.65 0.692 0.682 0.7270.704 0.771 0.68 0.745 0.732 0.7920.734 0.811 0.752 0.846 0.763 0.8230.737 0.817 0.75 0.845 0.765 0.826
Baseline systems: MAP 0.609; MRR 0.695
Friday, October 25, 13
Limitation of just word matching
Main sources of error:missing/erroneous entity relationships
lack of robust question analysis
lack of semantic inference
Q: In what film is Gordon Gekko the main character?
S: He received a best actor Oscar in 1987 for his role as Gordon Gekko in “Wall Street”.
Friday, October 25, 13
Takeaways & Discussion
Looks at a specific step of answer sentence selection in a QA pipelineSystematic analysis of addition of (improved) lexical semantic modelsCharacteristic of the dataset?
Friday, October 25, 13
top related