coreference resolution
DESCRIPTION
Coreference Resolution presentation by Shumin Wu and Nicolas Nicolov of J.D. Power and AssociatesTRANSCRIPT
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
Corefrence ResolutionCorefrence ResolutionA Machine Learning Approach
Shumin WuPh.D. Candidate in Computer Science
University of Colorado at BoulderThe Center for Spoken Language Research
1777 Exposition DriveBoulder, Colorado 80301, U.S.A.
Nicolas NicolovSenior Director, Science
J.D. Power and Associates, McGraw-HillWeb Intelligence Division
4888 Pearl East CircleBoulder, CO 80301, U.S.A.
[email protected] @colorado.edu
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
22
• What is Coreference?
• Coreference Performance Measures– MUC6 (Message Understanding Conference) F-measure– B3 (Bagga, Baldwin, Biermann)– CEAF (Constrained Entity-Alignment F-Measure)
• ICWSM JDPA Corpus
• Approaches– Heuristic– Machine Learning
OutlineOutline
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
3
Audi is an automaker that makes luxury cars and SUVs. The company was born in Germany . It was established by August Horch in 1910. Horch had previosly founded another company and his models were quite popular. Audi started with four cylinder models. By 1914, Horch 's new cars were racing and winning. August Horch left the Audi company in 1920 to take a position as an industry representative for the German motor vehicle industry federation. Currently Audi is a subsidiary of the Volkswagen group and produces cars of outstanding quality.
CoreferenceCoreference
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
4
• Sentiment Analysis (SA)Use coreference resolution to find sentiment elements of “Audi”
the company vs. German auto industry.
• Search/Question Answering (QA)Query for bio of “Jim Martin” the computer scientist vs. “Jim
Martin” the politician
• Machine Translation (MT)Chinese zero anaphora resolution :
看了很多相机后, [ 我 ] 买了个松下, 因为 [ 它 ] 镜头好。
After looking at many cameras, I bought a Panasonic, because it has a good lens.
Coreference ApplicationsCoreference Applications
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
5
a1 a2 a3 a4
b1 b2
c1 c2 c3
a1 a2 a3
a4 b1 b2
c1
c2 c3
Reference: System output:
Count the number of corresponding links between mentions
Precision = 4/5
Recall = 4/6
F-measure = 2* Precision * Recall/( Precision + Recall ) = 0.727
MUC6 F-measureMUC6 F-measure
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
6
a1 a2 a3 a4
b1 b2
c1 c2 c3
Reference: System output:
Precision = N/A
Recall = 0
F-measure = N/A
a1 a2 a3 a4
b1 b2
c1 c2 c3
Discounts single mention entities
All mentions form individual singleton entities.
MUC6 F-measure Degenerate CaseMUC6 F-measure Degenerate Case
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
7
a1 a2 a3 a4
b1 b2
c1 c2 c3
Reference: System output:
Precision = 6/8
Recall = 1
F-measure = 0.857 !!!
a1 a2 a3 a4
b1 b2
c1 c2 c3
Does not adequately penalize dense links.
All mentions form one big entity.
MUC6 F-measure Degenerate Case 2MUC6 F-measure Degenerate Case 2
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
8
RoadmapRoadmap
• What is Coreference?
• Coreference Performance Measures– MUC6 (Message Understanding Conference) F-measure– B3 (Bagga, Baldwin, Biermann)– CEAF (Constrained Entity-Alignment F-Measure)
• ICWSM JDPA Corpus
• Approaches– Heuristic– Machine Learning (Robust Risk Minimization)
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
9
a1 a2 a3 a4
b1 b2
c1 c2 c3
a1 a2 a3
a4 b1 b2
c1
c2 c3
Reference: System output:
For each mention, compute the proportion of corresponding mentions between reference and system entity.
Precision =1/9*(
Recall = 1/9*(
F-measure = 0.760
3/3 +3/3+3/3+1/3+2/3+2/3+1/1+2/2+2/2) = 0.852
3/4 +3/4+3/4+1/4+2/2+2/2+1/3+2/3+2/3) = 0.685
BB33 F-measure F-measure
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
10
a1 a2 a3 a4
b1 b2
c1 c2 c3
Reference: System output:
Precision = 1
Recall = 1/3
F-measure = 0.5
a1 a2 a3 a4
b1 b2
c1 c2 c3
All mentions form individual singleton entities.
BB33 F-measure Degenerate Case F-measure Degenerate Case
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
11
a1 a2 a3 a4
b1 b2
c1 c2 c3
Reference: System output:
Precision = 1/9*(4/9+4/9+4/9+4/9+2/9+2/9+3/9+3/9+3/9) = 0.358
Recall = 1
F-measure = 0.527
a1 a2 a3 a4
b1 b2
c1 c2 c3
Which system entity maps to which reference entity?
All mentions form one big entity.
BB33 F-measure Degenerate Case 2 F-measure Degenerate Case 2
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
12
RoadmapRoadmap
• What is Coreference?
• Coreference Performance Measures– MUC6 (Message Understanding Conference) F-measure– B3 (Bagga, Baldwin, Biermann)– CEAF (Constrained Entity-Alignment F-Measure)
• ICWSM JDPA Corpus
• Approaches– Heuristic– Machine Learning (Robust Risk Minimization)
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
13
a1 a2 a3 a4
b1 b2
c1 c2 c3
a1 a2 a3
a4 b1 b2
c1
c2 c3
Reference: System output:Find the one-to-one entity mapping between reference (R) and system (S) maximizing similarity measure
∑∑∑
∑∑∑∑
+=
+=
=
=
),(),(
),(2
2
),(
),(
),(
),(
RRSS
SR
rp
prF
RR
SRr
SS
SRp
φφφ
φφ
φφ
),( SRφ
778.0,),( =∩= FSRSRφ
702.0,2),( =+∩
= FSR
SRSRφ
CEAFCEAF
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
14
a1 a2 a3 a4
b1 b2
c1 c2 c3
Reference: System output:
a1 a2 a3 a4
b1 b2
c1 c2 c3
333.0,),( =∩= FSRSRφ
261.0,2),( =+∩
= FSR
SRSRφ
CEAF Degenerate CaseCEAF Degenerate Case
All mentions form individual singleton entities.
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
15
a1 a2 a3 a4
b1 b2
c1 c2 c3
Reference: System output:
a1 a2 a3 a4
b1 b2
c1 c2 c3
444.0,),( =∩= FSRSRφ
333.0,2),( =+∩
= FSR
SRSRφ
CEAF Degenerate Case 2CEAF Degenerate Case 2
All mentions form one big entity.
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
16
Performance Measures SummaryPerformance Measures Summary
• MUC6 F-measure– Ignores single mention entities.– Potentially biased toward large clusters.– No one-to-one entity mapping guarantee.
• B3
– Set view of mentions in an entity.– Based on number of corresponding mentions between entities
averaged over total number of mentions.– Does not provide one-to-one entity mapping.
• CEAF– One-to-one entity mapping.– Optimal mapping can be tuned to a different similarity
measure.
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
17
RoadmapRoadmap
• What is Coreference?
• Coreference Performance Measures– MUC6– B3
– CEAF
• ICWSM JDPA Corpus
• Approaches– Heuristic– Machine Learning (Robust Risk Minimization)
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
18
ICWSM JDPA Coreference CorpusICWSM JDPA Coreference Corpus• The JDPA Corpus consists of user-generated content (blog posts) containing
opinions about automobiles and digital cameras. They have been manually annotated for named, nominal, and pronominal mentions of entities. Entities are marked with the aggregate sentiment expressed toward them in the document. Mentions of each entity are marked as co-referential. Mentions are assigned semantic types consisting of the Automatic Content Extraction (ACE) mention types and additional domain-specific types. Meronymy (part-of and feature-of) and instance relations are also annotated. Expressions which convey sentiment toward an entity are annotated with the polarity of their prior and contextual sentiments as well the mentions they target. The following modifiers are annotated. These may target other modifiers or sentiment expressions:
– negators (expressions which invert the polarity of a sentiment expression or modifier)
– neutralizers (expressions that do not commit the speaker to the truth of the target sentiment expression or modifier)
– committers (expressions which shift the commitment of the speaker toward the truth a sentiment expression or modifier)
– intensifiers (expressions which shift the intensity of a sentiment expression or modifier)
Additionally, we have annotated when the opinion holder of a sentiment expression is someone other than the author of the blog by linking the expression to the holder. We also annotate when two entities are compared on a particular dimension.
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
19
ICWSM JDPA Corpus: Mention TypesICWSM JDPA Corpus: Mention Types
• Person, Organization, Location
• GeoPolitical (Countries, USStates, Nationalities, City)
• Time (Year, Month, Date, Duration, Days, OClock)
• Units (Money, Age)
• Vehicles (Cars, SUVs, Trucks)
• CarPart, CarFeature
• Camera (Part, Feature, Accessory)
• Meal, Food, Beverage, FoodFeature, FoodPart, CookingMethod, Marketing, CookingTool
• Descriptor
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
20
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
21
ICWSM JDPA Corpus StatisticsICWSM JDPA Corpus Statistics
• Mentions: 100,648
• Entities: 67,038
0
10000
20000
30000
40000
50000
60000
1 10 19 28 37 46 55 64
mentions
entit
ies
with
x m
entio
ns in
the
mMention type Entities*Person 13,505Organization 8,206Location 245City 360US State 127Country 316Nationality 195Facility 1,052Vehicles 13,977CarPart 15,478CarFeature 6,119Camera 5,197CameraFeature 9,575CameraPart 3,240CameraAccessory 352Descriptor 13,087… …
*Entity x DocId. “Audi” in doc5 and in doc7 are considered different entities.
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
22
RoadmapRoadmap
• What is Coreference?
• Coreference Performance Measures– MUC6– B3
– CEAF
• ICWSM JDPA Corpus
• Approaches– Heuristic– Machine Learning (Robust Risk Minimization)
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
23
Coreference ApproachesCoreference Approaches
• Build mention-pair model– Select mention-pair features:
Lexical match Distance between mentions Syntactic features
– Heuristics– Machine learning
Classifiers: MaxEnt, RRM, SVM, etc.
• Cluster compatible entities, mentions– Greedy clustering:
Forward, reverse direction, sentence to document– BellTree
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
24
Mention Pair Features ConsideredMention Pair Features Considered• Lexical
– String match: exact, left or right substring– Acronym (GM & General Motors)– Edit distance (Toyota & Toyoda)– Lemma: words of the entire mention, the head noun, and determiner
(if present)– Capitalization: whole word or only first letter– Number: whether NP is a number or starts with a number
• Distance-based– Word distance: number of in-between words– Sentence distance: number of in-between sentences– Mention distance: number of in-between mentions
• Syntax-based– Part-of-speech tag of the mention head
• Pronoun– Gender: masculine, femine, neuter– Number: singular, plural– Personal: first, second, third– Possessive– Reflexive
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
25
RoadmapRoadmap
• What is Coreference?
• Coreference Performance Measures– MUC6– B3
– CEAF
• ICWSM JDPA Corpus
• Approaches– Heuristic– Machine Learning (Robust Risk Minimization)
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
26
Coreference HeuristicsCoreference Heuristics• Compatible mentions:
– Exact string match of capitalized mentions“Audi” & “Audi”
– Exact string match of mentions within a sentence“car” & “car”
– Acronyms“GM” & “General Motor”
– First person pronoun“I” & “me”
– Second person pronoun“you” & “yours”
– Third person pronoun with gender, number agreement in the same sentence“he” & “him”, “she” & “her”
– Non-numeral mentions with editing distance < 15% of the length of the mention“engine” & “eengine”, “Toyota” & “Toyoda”
• Incompatible mentions:– Different acronyms
“GM” & “BMW”– Personal, gender, number disagreement
“I” & “you”, “he” & “she”, “car” & “cars”
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
27
Configuration MUC-F MUC-P MUC-R B3-F B3-P B3-R
Unlinked entities -- -- 0 71.8 100 56.0
Single entity (w/ all mentions) 61.4 44.3 100 8.6 4.5 100
w/ editing distant 66.2 64.0 68.5 78.6 76.8 80.4
editing dist. at sentence level 64.8 63.7 66.0 78.7 77.8 79.4
w/o editing distant 70.0 75.8 64.9 83.0 86.8 79.5
Edit dist + cardinality match 70.4 78.7 63.7 83.7 89.2 78.8
Order of clustering (local (mentions within sentence) to global, forward, and reverse direction) did not alter our results.
Heuristic System: ResultsHeuristic System: Results
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
28
RoadmapRoadmap
• What is Coreference?
• Coreference Performance Measures– MUC6– B3
– CEAF
• ICWSM JDPA Corpus
• Approaches– Heuristic– Machine Learning (Robust Risk Minimization)
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
29
Machine Learning ApproachMachine Learning Approach• Feature value
– Converted to binary (quantize scalar values)
• Training sample selection– Positive samples formed with pairs of consecutive coreferent
mentions, negative samples formed using any mentions between consecutive coreferent mentions
– All mention pairs within a small window (a few sentences)
• Robust Risk Minimization (Generalized Winnow)– Linear classifier– Multiplicative weight update (quickly discounts irrelevant
features after a few iterations)– Class margin (m) can be converted to probability:
−≤−>>
≥= +
1
11
1
0
1
2
1
m
m
m
p m
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
30
−+
−+
−−
++
−−
++
−
−+−+
−+
−+
−=−=
+==
−==−=
==−−−=
−−−==
===
=====
θθθ
θθθθ
ηθθ
µθθµ
θ
www
ΔaaayΔa
yΔadjyxΔaww
djyxΔawwapacΔa
yyxwwp
djwwnia
wyxyx
iii
ii
ii
iij
i
iij
i
ic
iacii
iiiT
jj
i
nn
endend
)exp(
)exp( )...1( )exp(
)...1( )exp( ))),(,2max(min(
)()( 1...nifor
1...Kkfor
)...1( )...1( 0
thresholdand ctor weight ve:ouput),(),...,,( data training:input
jj
jj
11Separate positive
and negative weights
Multiplicative weight update
Robust Risk Minimization (RRM)Robust Risk Minimization (RRM)
RRM was proposed by Tong Zhang; best CoNLL’03 chunker.
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
31
Input sequence: a1, b1, b2, a2… [a1]
[a1,b1] [a1][b1]
[a1,b1,b2] [a1,b1][b2] [a1,b2][b1]
[a1][b1,b2]
[a1][b1][b2]
[a1,b1,b2,a2]
[a1,b1,b2][a2]
[a1,a2] [b1,b2] [a1][b1,b2,a2] [a1][b1,b2][a2]
[a1,b1][b2][a2]
[a1,b1, a2][b2] [a1,b1][b2,a2]
[a1,b2,a2][b1]
[a1,b2][b1,a2]
[a1,b2][b1][a2]
[a1,a2][b1][b2]
[a1][b1,a2][b2]
[a1][b1][b2,a2]
[a1][b1][b2] [a2]
Bell TreeBell Tree
Lots of states in the search space to explore!
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
32
Bell Tree Coreference ModelBell Tree Coreference Model
Given an input sequence of mentions m1, m2..., mk, ...mn, and:
et: entity
Ek: set of partial entities containing mentions m1…mk
Ak: index of entity which the next mention should merge with
L: binary (1=link to existing entity, 0=create entity)
Define
link model (link mention to existing entity):
creation model (start a new entity):
),|1(max),|1(
),,|1(
kem
kt
kkk
mmLPmeLP
tAmELP
t
=≈=≈
==
∈
)),,|1(max1(),|0(
tAmELPmELP
kkkt
kk
==−≈=
α
Entities assumed to be independent
Link probability derived from the most probable mention pair
Tunable to encourage or penalize entity creation
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
33
Input sequence: a1, b1, b2, a2, c1…[a1] p = 1
[a1,b1]p=0.4
a1 b1 b2 a2 c1
a1 1
b1 0.4 1
b2 0.2 0.9 1
a2 0.8 0.1 0.3 1
c1 0.4 0.3 0.4 0.2 1
[a1][b1]p=0.6
Coreference probability:
[a1,b1,b2]p=0.36
[a1,b1][b2]p=0.04
[a1,b2][b1]p=0.12
[a1][b1,b2]p=0.54
[a1][b1][b2]p=0.06
[a1,b1,b2,a2]p=0.288
[a1,b1,b2][a2]p=0.072
[a1,a2] [b1,b2]p=0.432
[a1][b1,b2,a2]p=0.162
[a1][b1,b2][a2]p=0.108
[a1,b1,b2, a2,c1]p=0.1152
[a1,b1,b2, a2][c1]p=0.1728
[a1,a2,c1][b1, b2]p=0.1728
[a1,a2][b1, b2,c1]p=0.1728
[a1,a2][b1, b2][c1]p=0.2592
Bell Tree in ActionBell Tree in Action
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
34
DiscussionDiscussion
• How can coreference scoring measures be evaluated?– Consistency:
Does better score equate better human judgment of output? Do all measures score higher for one set of output over another?
– Application specific: Does better score translate to better application performance
(sentiment analysis, machine translation)?
• Techniques for picking mention-pair training samples– Cluster mention-pairs and pick minority class samples within
each cluster.
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
35
Future WorkFuture Work
• Coreference model– Features
Entity class type Dependency and/or semantic role features for sentence level
mentions: parse tree path, predicate, arguments– Classification
Training sample selection: select mention pairs with discriminatory features
Multi-class: classify between mentions with strong compatibility indicators and mentions with weak compatibility indicators
Algorithms: SVM, Random Forest
• Clustering– Algorithm: SVMcluster, soft CSP– Different similarity metrics
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
36
AcknowledgementsAcknowledgements
• Dr. Xiaoqiang LuoIBM T.J.Watson Research Center
• Prof. Martha PalmerUniv. of Colorado
• Prof. James MartinUniv. of Colorado
• Jason KesslerJ.D. Power and Associates
• Dr. Miriam EckertJ.D. Power and Associates
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
37
ReferencesReferencesAmit Bagga & Breck Baldwin. 1998. Algorithms for Scoring Coreference Chains. 1st International Conference on Language
Resources and Evaluation Workshop on Linguistics Coreference, pp. 563–566.
Dan Cristea & Oana Postolache. 2005. How to Deal with Wicked Anaphora. Anaphora Processing: Linguistic, Cognitive and Computational Modelling, ed. by A. Branco, T. McEnery & R. Mitkov, pp. 17-46. John Benjamins: Amsterdam & Philadelphia.
Thomas Finley & Thorsten Joachims. 2005. Supervised Clustering with Support Vector Machines. 22nd International Conference on Machine Learning (ICML’05), pp. 217–224, New York, N.Y., U.S.A. ACM.
Xiaoqiang Luo, Abe Ittycheriah, Hongyan Jing, Nanda Kambhatla & Salim Roukos. 2004. A Mention-Synchronous Coreference Resolution Algorithm Based on the Bell Tree. 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), page 135, Morristown, N.J., U.S.A. ACL.
Xiaoqiang Luo. 2005. On Coreference Resolution Performance Metrics. Human Language Technology and Empirical Methods in Natural Language Processing (HLT’05), pp. 25–32, Morristown, N.J., U.S.A. ACL.
Jason S. Kessler & Nicolas Nicolov. 2009. Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations. 3rd International AAAI Conference on Weblogs and Social Media (ICWSM’09), San Jose, California, U.S.A.
Nicolas Nicolov. 2003. Book review: Anaphora Resolution” (R.Mitkov), IEEE Computational Intelligence Bulletin, Vol. 2, No. 1, pp. 31-32, June 2003.
Oana Postolache & Corina Forascu. 2004. A Coreference Model on Excerpts from a Novel. European Summer School in Logic Language and Information – ESSLLI'04, pp. 202-213. Nancy, France.
Marc Vilain, John Burger, John Aberdeen, Dennis Connolly & Lynette Hirschman. 1995. A Model-Theoretic Coreference Scoring Scheme. 6th conference on Message understanding (MUC6 ’95), pp. 45–52, Morristown, N.J., U.S.A. ACL.
Tong Zhang, Fred Damerau & David Johnson. 2002. Text Chunking Based on a Generalization of Winnow. Journal of Machine Learning Research, 2:615–637.
© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.
38
Thank you!