![Page 1: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/1.jpg)
Improved Relation Extraction with Feature-Rich Compositional
Embedding Models
September 21, 2015EMNLP
1
Mo Yu* Matt Gormley*
Mark Dredze
*Co-first authors
![Page 2: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/2.jpg)
FCM or: How I Learned to Stop Worrying (about Deep Learning)
and Love Features
September 21, 2015EMNLP
2
Mo Yu* Matt Gormley*
Mark Dredze
*Co-first authors
![Page 3: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/3.jpg)
Handcrafted Features
3
NNP : VBN NNP VBD
PERLOC
Egypt - born Proyas directed
S
NP VP
ADJP VPNP
egypt - born proyas direct
p(y|x) ∝exp(Θyf( ))
born-in
![Page 4: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/4.jpg)
Where do features come from?
4
Feat
ure
En
gin
ee
rin
g
Feature Learning
hand-craftedfeatures
Sun et al., 2011
Zhou et al.,2005
First word before M1
Second word before M1
Bag-of-words in M1
Head word of M1
Other word in between
First word after M2
Second word after M2
Bag-of-words in M2
Head word of M2
Bigrams in between
Words on dependency path
Country name list
Personal relative triggers
Personal title list
WordNet Tags
Heads of chunks in between
Path of phrase labels
Combination of entity types
![Page 5: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/5.jpg)
Where do features come from?
5
Feat
ure
En
gin
ee
rin
g
Feature Learning
hand-craftedfeatures
Sun et al., 2011
Zhou et al.,2005 word
embeddingsMikolov et al.,
2013
CBOW model in Mikolov et al. (2013)
input(context words)
embedding
missing word
Look-up table Classifier
0.13 .26 … -.52
0.11 .23 … -.45
dog:
cat:similar words,similar embeddings
unsupervisedlearning
![Page 6: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/6.jpg)
Where do features come from?
6
Feat
ure
En
gin
ee
rin
g
Feature Learning
hand-craftedfeatures
Sun et al., 2011
Zhou et al.,2005 word
embeddingsMikolov et al.,
2013
stringembeddings
Collobert & Weston, 2008
Socher, 2011
Convolutional Neural Networks (Collobert and Weston 2008)
The [movie] showed [wars]
pooling
CNN
Recursive Auto Encoder (Socher 2011)
The [movie] showed [wars]
RAE
![Page 7: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/7.jpg)
Where do features come from?
7
Feat
ure
En
gin
ee
rin
g
Feature Learning
hand-craftedfeatures
Sun et al., 2011
Zhou et al.,2005 word
embeddingsMikolov et al.,
2013
treeembeddings
Socher et al.,2013
Hermann & Blunsom,2013
stringembeddings
Collobert & Weston, 2008
Socher, 2011
The [movie] showed [wars]
WNP,VP
WDT,NN WV,NN
S
NP VP
![Page 8: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/8.jpg)
Where do features come from?
8
wordembeddings
treeembeddings
hand-craftedfeatures
stringembeddings
Feat
ure
En
gin
ee
rin
g
Feature Learning
Sun et al., 2011
Zhou et al.,2005
Mikolov et al.,2013
Collobert & Weston, 2008
Socher, 2011
Socher et al.,2013
Hermann & Blunsom,2013
Hermann et al.2014
word embedding features
Turian et al. 2010
Koo et al. 2008
![Page 9: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/9.jpg)
Where do features come from?
9
wordembeddings
treeembeddings
word embedding featureshand-crafted
features
Our model (FCM)
stringembeddings
Feat
ure
En
gin
ee
rin
g
Feature Learning
Sun et al., 2011
Zhou et al.,2005
Mikolov et al.,2013
Collobert & Weston, 2008
Socher, 2011
Socher et al.,2013
Turian et al. 2010
Koo et al. 2008
Hermann et al.2014
Hermann & Blunsom,2013
![Page 10: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/10.jpg)
Feature-rich Compositional Embedding Model (FCM)
Goals for our Model:
1. Incorporate semantic/syntactic structural information
2. Incorporate word meaning
3. Bridge the gap between feature engineering and feature learning – but remain as simple as possible
10
![Page 11: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/11.jpg)
Feature-rich Compositional Embedding Model (FCM)
11The [movie]M1 I watched depicted [hope]M2
nilnoun-other
noun-person
noun-other
verb-percep.
verb-comm.
on-path(wi)
is-between(wi)
head-of-M1(wi)
head-of-M2(wi)
before-M1(wi)
before-M2(wi)
…
0
0
0
0
1
0
…
1
0
1
0
0
0
…
0
1
0
0
0
0
…
0
1
0
0
0
0
…
1
1
0
0
0
1
…
1
0
0
1
0
0
…
f1 f2 f3 f4 f5 f6
Per-word Features:
![Page 12: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/12.jpg)
Feature-rich Compositional Embedding Model (FCM)
12The [movie]M1 I watched depicted [hope]M2
nilnoun-other
noun-person
noun-other
verb-percep.
verb-comm.
on-path(wi)
is-between(wi)
head-of-M1(wi)
head-of-M2(wi)
before-M1(wi)
before-M2(wi)
…
1
1
0
0
0
1
…
f5
Per-word Features:
![Page 13: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/13.jpg)
Feature-rich Compositional Embedding Model (FCM)
13The [movie]M1 I watched depicted [hope]M2
nilnoun-other
noun-person
noun-other
verb-percep.
verb-comm.
on-path(wi) & wi= “depicted”
is-between(wi) & wi= “depicted”
head-of-M1(wi) & wi= “depicted”
head-of-M2(wi) & wi= “depicted”
before-M1(wi) & wi= “depicted”
before-M2(wi) & wi= “depicted”
…
1
1
0
0
0
1
…
f5
Per-word Features: (with conjunction)
![Page 14: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/14.jpg)
Feature-rich Compositional Embedding Model (FCM)
14The [movie]M1 I watched depicted [hope]M2
nilnoun-other
noun-person
noun-other
verb-percep.
verb-comm.
1
1
0
0
0
1
…
f5
Per-word Features: (with soft conjunction)
on-path(wi)
is-between(wi)
head-of-M1(wi)
head-of-M2(wi)
before-M1(wi)
before-M2(wi)
…
-.3 .9 .1 -1
edepicted
Outer-product
![Page 15: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/15.jpg)
Feature-rich Compositional Embedding Model (FCM)
15The [movie]M1 I watched depicted [hope]M2
nilnoun-other
noun-person
noun-other
verb-percep.
verb-comm.
1
1
0
0
0
1
…
f5
Per-word Features: (with soft conjunction)
on-path(wi)
is-between(wi)
head-of-M1(wi)
head-of-M2(wi)
before-M1(wi)
before-M2(wi)
…
-.3 .9 .1 -1
edepicted
-.3 .9 .1 -1
-.3 .9 .1 -1
-.3 .9 .1 -1
0 0 0 0
0 0 0 0
-.3 .9 .1 -1
… … … …
![Page 16: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/16.jpg)
Feature-rich Compositional Embedding Model (FCM)
16
fi
ewi
p(y|x) ∝ exp Σi=1
n
Our full model sums over each word in the sentence
Then takes the dot-product with a parameter tensor
Ty
And finally, exponentiatesand renormalizes
![Page 17: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/17.jpg)
Features for FCM
• Let M1 and M2 denote the left and right entity mentions
• Our per-word Binary Features: head of M1
head of M2
in-between M1 and M2
-2, -1, +1, or +2 of M1
-2, -1, +1, or +2 of M2
on dependency path between M1 and M2
• Optionally: Add the entity type of M1, M2, or both
17
![Page 18: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/18.jpg)
FCM as a Neural Network
18
Σ
Τ
fn
p(y|x)
f1 e
h1 hn
ex
���
e
���
��� ���
���
���
w1 wn
Binary features
Embeddings
Parameter tensor
• Embeddings are (optionally) treated as model parameters• A log-bilinear model• We initialize, then fine-tune the embeddings
![Page 19: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/19.jpg)
Baseline Model
19
Yi,j
NNP : VBN NNP VBD
PERLOC
Egypt- bornProyasdirected
S
NP VP
ADJP VPNP
egypt- bornproyasdirect
born-in
p(y|x) µ
exp(Θy�f ( ) )
• Multinomial logistic regression (standard approach)
• Bring in all the usual binary NLP features (Sun et al., 2011)– type of the left entity mention
– dependency path between mentions
– bag of words in right mention
– …
![Page 20: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/20.jpg)
Hybrid Model: Baseline + FCM
20
Yi,j
Σ
Τ
fn
p(y|x)
f1 e
h1 hn
ex
���
e
���
��� ���
���
���
w1 wn
NNP : VBN NNP VBD
PERLOC
Egypt- bornProyasdirected
S
NP VP
ADJP VPNP
egypt- bornproyasdirect
born-in
p(y|x) µ
exp(Θy�f ( ) )
p(y|x) = pBaseline(y|x) pFCM(y|x)1Z(x)
Product of Experts:
![Page 21: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/21.jpg)
Experimental Setup
ACE 2005
• Data: 6 domains– Newswire (nw)– Broadcast Conversation (bc)– Broadcast News (bn)– Telephone Speech (cts)– Usenet Newsgroups (un)– Weblogs (wl)
• Train: bn+nw (~3600 relations)Dev: ½ of bcTest: ½ of bc, cts, wl`
• Metric: Micro F1(given entity mention)
SemEval-2010 Task 8
• Data: Web text– Newswire (nw)– Broadcast Conversation (bc)– Broadcast News (bn)– Telephone Speech (cts)– Usenet Newsgroups (un)– Weblogs (wl)
• Train:Dev: Test:
• Metric: Macro F1(given entity boundaries)
21
Standard split from shared task
![Page 22: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/22.jpg)
ACE 2005 Results
22
45%
50%
55%
60%
65%
Broadcast Conversation Conversational TelephoneSpeech
Weblogs
Mic
ro F
1
Test Set
Baseline
FCM
Baseline+FCM
![Page 23: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/23.jpg)
SemEval-2010 ResultsSource Classifier F1
Socher et al. (2012) RNN 74.8
Socher et al. (2012) MVRNN 79.1
Hashimoto et al. (2015) RelEmb 81.8
Rink and Harabagiu (2010) SVM 82.2
Zeng et al. (2014) CNN 82.7
Santos et al. (2015) CR-CNN (log-loss) 82.7
Liu et al. (2015) DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8
70 72 74 76 78 80
82
84
86
Best in SemEval-2010 Shared Task
![Page 24: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/24.jpg)
SemEval-2010 ResultsSource Classifier F1
Socher et al. (2012) RNN 74.8
Socher et al. (2012) MVRNN 79.1
Hashimoto et al. (2015) RelEmb 81.8
Rink and Harabagiu (2010) SVM 82.2
Zeng et al. (2014) CNN 82.7
Santos et al. (2015) CR-CNN (log-loss) 82.7
Liu et al. (2015) DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8
FCM (log-linear) 81.4
FCM (log-bilinear) 83.0
70 72 74 76 78 80
82
84
86
Best in SemEval-2010 Shared Task
![Page 25: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/25.jpg)
SemEval-2010 ResultsSource Classifier F1
Socher et al. (2012) RNN 74.8
Socher et al. (2012) MVRNN 79.1
Hashimoto et al. (2015) RelEmb 81.8
Rink and Harabagiu (2010) SVM 82.2
Zeng et al. (2014) CNN 82.7
Santos et al. (2015) CR-CNN (log-loss) 82.7
Liu et al. (2015) DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8
FCM (log-linear) 81.4
FCM (log-bilinear) 83.0
70 72 74 76 78 80
82
84
86
![Page 26: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/26.jpg)
SemEval-2010 ResultsSource Classifier F1
Socher et al. (2012) RNN 74.8
Socher et al. (2012) MVRNN 79.1
Hashimoto et al. (2015) RelEmb 81.8
Rink and Harabagiu (2010) SVM 82.2
Xu et al. (2015) SDP-LSTM 82.4
Zeng et al. (2014) CNN 82.7
Santos et al. (2015) CR-CNN (log-loss) 82.7
Liu et al. (2015) DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8
Xu et al. (2015) SDP-LSTM (full) 83.7
FCM (log-linear) 81.4
FCM (log-bilinear) 83.0
70 72 74 76 78 80
82
84
86
![Page 27: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/27.jpg)
SemEval-2010 ResultsSource Classifier F1
Socher et al. (2012) RNN 74.8
Socher et al. (2012) MVRNN 79.1
Hashimoto et al. (2015) RelEmb 81.8
Rink and Harabagiu (2010) SVM 82.2
Xu et al. (2015) SDP-LSTM 82.4
Zeng et al. (2014) CNN 82.7
Santos et al. (2015) CR-CNN (log-loss) 82.7
Liu et al. (2015) DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8
Xu et al. (2015) SDP-LSTM (full) 83.7
FCM (log-linear) 81.4
FCM (log-bilinear) 83.0
FCM (log-bilinear)(task-spec-emb)
83.7
70 72 74 76 78 80
82
84
86
![Page 28: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/28.jpg)
SemEval-2010 ResultsSource Classifier F1
Socher et al. (2012) RNN 74.8
Socher et al. (2012) MVRNN 79.1
Hashimoto et al. (2015) RelEmb 81.8
Rink and Harabagiu (2010) SVM 82.2
Xu et al. (2015) SDP-LSTM 82.4
Zeng et al. (2014) CNN 82.7
Santos et al. (2015) CR-CNN (log-loss) 82.7
Liu et al. (2015) DepNN 82.8
Hashimoto et al. (2015) RelEmb (task-spec-emb) 82.8
Xu et al. (2015) SDP-LSTM (full) 83.7
Santos et al. (2015) CR-CNN (ranking-loss) 84.1
FCM (log-linear) 81.4
FCM (log-bilinear) 83.0
FCM (log-bilinear)(task-spec-emb)
83.7
70 72 74 76 78 80
82
84
86
![Page 29: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/29.jpg)
Takeaways
FCM bridges the gap between feature engineering and feature learning
If you are allergic to deep learning:– Try the FCM for your task: it is simple, easy-to-
implement, and was shown to be effective for two relation benchmarks
If you are a deep learning expert:– Inject the FCM (i.e. outer product of features and
embeddings) into your fancy deep network
29
![Page 30: Improved Relation Extraction with Feature-Rich ... · Improved Relation Extraction with Feature-Rich Compositional Embedding Models September 21, 2015 EMNLP 1 Mo Yu* Matt Gormley*](https://reader033.vdocuments.site/reader033/viewer/2022060601/60555e4887cae469301cd607/html5/thumbnails/30.jpg)
Questions?
Two open source implementations:
– Java: (Within the Pacaya framework)https://github.com/mgormley/pacaya
– C++: (From our NAACL 2015 paper on LRFCM)https://github.com/Gorov/ERE_RE