coreference resolution

© 2010 J.D. Power and Associates, The McGraw-Hill Companies, Inc. All Rights Reserved.

Corefrence ResolutionCorefrence ResolutionA Machine Learning Approach

Shumin WuPh.D. Candidate in Computer Science

University of Colorado at BoulderThe Center for Spoken Language Research

1777 Exposition DriveBoulder, Colorado 80301, U.S.A.

Nicolas NicolovSenior Director, Science

J.D. Power and Associates, McGraw-HillWeb Intelligence Division

4888 Pearl East CircleBoulder, CO 80301, U.S.A.

[email protected] @colorado.edu


22

• What is Coreference?

• Coreference Performance Measures– MUC6 (Message Understanding Conference) F-measure– B3 (Bagga, Baldwin, Biermann)– CEAF (Constrained Entity-Alignment F-Measure)

• ICWSM JDPA Corpus

• Approaches– Heuristic– Machine Learning

OutlineOutline


3

Audi is an automaker that makes luxury cars and SUVs. The company was born in Germany . It was established by August Horch in 1910. Horch had previosly founded another company and his models were quite popular. Audi started with four cylinder models. By 1914, Horch 's new cars were racing and winning. August Horch left the Audi company in 1920 to take a position as an industry representative for the German motor vehicle industry federation. Currently Audi is a subsidiary of the Volkswagen group and produces cars of outstanding quality.

CoreferenceCoreference


4

• Sentiment Analysis (SA)Use coreference resolution to find sentiment elements of “Audi”

the company vs. German auto industry.

• Search/Question Answering (QA)Query for bio of “Jim Martin” the computer scientist vs. “Jim

Martin” the politician

• Machine Translation (MT)Chinese zero anaphora resolution ：

看了很多相机后， [ 我 ] 买了个松下，因为 [ 它 ] 镜头好。

After looking at many cameras, I bought a Panasonic, because it has a good lens.

Coreference ApplicationsCoreference Applications


5

a1 a2 a3 a4

b1 b2

c1 c2 c3

a1 a2 a3

a4 b1 b2

c1

c2 c3

Reference: System output:

Count the number of corresponding links between mentions

Precision = 4/5

Recall = 4/6

F-measure = 2* Precision * Recall/( Precision + Recall ) = 0.727

MUC6 F-measureMUC6 F-measure


6

a1 a2 a3 a4

b1 b2

c1 c2 c3


Precision = N/A

Recall = 0

F-measure = N/A

a1 a2 a3 a4

b1 b2

c1 c2 c3

Discounts single mention entities

All mentions form individual singleton entities.

MUC6 F-measure Degenerate CaseMUC6 F-measure Degenerate Case


7

a1 a2 a3 a4

b1 b2

c1 c2 c3


Precision = 6/8

Recall = 1

F-measure = 0.857 !!!

a1 a2 a3 a4

b1 b2

c1 c2 c3

Does not adequately penalize dense links.

All mentions form one big entity.

MUC6 F-measure Degenerate Case 2MUC6 F-measure Degenerate Case 2


8

RoadmapRoadmap




• Approaches– Heuristic– Machine Learning (Robust Risk Minimization)


9

a1 a2 a3 a4

b1 b2

c1 c2 c3

a1 a2 a3

a4 b1 b2

c1

c2 c3


For each mention, compute the proportion of corresponding mentions between reference and system entity.

Precision =1/9*(

Recall = 1/9*(

F-measure = 0.760

3/3 +3/3+3/3+1/3+2/3+2/3+1/1+2/2+2/2) = 0.852

3/4 +3/4+3/4+1/4+2/2+2/2+1/3+2/3+2/3) = 0.685

BB33 F-measure F-measure


10

a1 a2 a3 a4

b1 b2

c1 c2 c3


Precision = 1

Recall = 1/3

F-measure = 0.5

a1 a2 a3 a4

b1 b2

c1 c2 c3


BB33 F-measure Degenerate Case F-measure Degenerate Case


11

a1 a2 a3 a4

b1 b2

c1 c2 c3


Precision = 1/9*(4/9+4/9+4/9+4/9+2/9+2/9+3/9+3/9+3/9) = 0.358

Recall = 1

F-measure = 0.527

a1 a2 a3 a4

b1 b2

c1 c2 c3

Which system entity maps to which reference entity?


BB33 F-measure Degenerate Case 2 F-measure Degenerate Case 2


12

RoadmapRoadmap






13

a1 a2 a3 a4

b1 b2

c1 c2 c3

a1 a2 a3

a4 b1 b2

c1

c2 c3

Reference: System output:Find the one-to-one entity mapping between reference (R) and system (S) maximizing similarity measure

∑∑∑

∑∑∑∑

+=

+=

=

=

),(),(

),(2

2

),(

),(

),(

),(

RRSS

SR

rp

prF

RR

SRr

SS

SRp

φφφ

φφ

φφ

),( SRφ

778.0,),( =∩= FSRSRφ

702.0,2),( =+∩

= FSR

SRSRφ

CEAFCEAF


14

a1 a2 a3 a4

b1 b2

c1 c2 c3


a1 a2 a3 a4

b1 b2

c1 c2 c3

333.0,),( =∩= FSRSRφ

261.0,2),( =+∩

= FSR

SRSRφ

CEAF Degenerate CaseCEAF Degenerate Case



15

a1 a2 a3 a4

b1 b2

c1 c2 c3


a1 a2 a3 a4

b1 b2

c1 c2 c3

444.0,),( =∩= FSRSRφ

333.0,2),( =+∩

= FSR

SRSRφ

CEAF Degenerate Case 2CEAF Degenerate Case 2



16

Performance Measures SummaryPerformance Measures Summary

• MUC6 F-measure– Ignores single mention entities.– Potentially biased toward large clusters.– No one-to-one entity mapping guarantee.

• B3

– Set view of mentions in an entity.– Based on number of corresponding mentions between entities

averaged over total number of mentions.– Does not provide one-to-one entity mapping.

• CEAF– One-to-one entity mapping.– Optimal mapping can be tuned to a different similarity

measure.


17

RoadmapRoadmap


• Coreference Performance Measures– MUC6– B3

– CEAF




18

ICWSM JDPA Coreference CorpusICWSM JDPA Coreference Corpus• The JDPA Corpus consists of user-generated content (blog posts) containing

opinions about automobiles and digital cameras. They have been manually annotated for named, nominal, and pronominal mentions of entities. Entities are marked with the aggregate sentiment expressed toward them in the document. Mentions of each entity are marked as co-referential. Mentions are assigned semantic types consisting of the Automatic Content Extraction (ACE) mention types and additional domain-specific types. Meronymy (part-of and feature-of) and instance relations are also annotated. Expressions which convey sentiment toward an entity are annotated with the polarity of their prior and contextual sentiments as well the mentions they target. The following modifiers are annotated. These may target other modifiers or sentiment expressions:

– negators (expressions which invert the polarity of a sentiment expression or modifier)

– neutralizers (expressions that do not commit the speaker to the truth of the target sentiment expression or modifier)

– committers (expressions which shift the commitment of the speaker toward the truth a sentiment expression or modifier)

– intensifiers (expressions which shift the intensity of a sentiment expression or modifier)

Additionally, we have annotated when the opinion holder of a sentiment expression is someone other than the author of the blog by linking the expression to the holder. We also annotate when two entities are compared on a particular dimension.


19

ICWSM JDPA Corpus: Mention TypesICWSM JDPA Corpus: Mention Types

• Person, Organization, Location

• GeoPolitical (Countries, USStates, Nationalities, City)

• Time (Year, Month, Date, Duration, Days, OClock)

• Units (Money, Age)

• Vehicles (Cars, SUVs, Trucks)

• CarPart, CarFeature

• Camera (Part, Feature, Accessory)

• Meal, Food, Beverage, FoodFeature, FoodPart, CookingMethod, Marketing, CookingTool

• Descriptor


20


21

ICWSM JDPA Corpus StatisticsICWSM JDPA Corpus Statistics

• Mentions: 100,648

• Entities: 67,038

0

10000

20000

30000

40000

50000

60000

1 10 19 28 37 46 55 64

mentions

entit

ies

with

x m

entio

ns in

the

mMention type Entities*Person 13,505Organization 8,206Location 245City 360US State 127Country 316Nationality 195Facility 1,052Vehicles 13,977CarPart 15,478CarFeature 6,119Camera 5,197CameraFeature 9,575CameraPart 3,240CameraAccessory 352Descriptor 13,087… …

*Entity x DocId. “Audi” in doc5 and in doc7 are considered different entities.


22

RoadmapRoadmap



– CEAF




23

Coreference ApproachesCoreference Approaches

• Build mention-pair model– Select mention-pair features:

Lexical match Distance between mentions Syntactic features

– Heuristics– Machine learning

Classifiers: MaxEnt, RRM, SVM, etc.

• Cluster compatible entities, mentions– Greedy clustering:

Forward, reverse direction, sentence to document– BellTree


24

Mention Pair Features ConsideredMention Pair Features Considered• Lexical

– String match: exact, left or right substring– Acronym (GM & General Motors)– Edit distance (Toyota & Toyoda)– Lemma: words of the entire mention, the head noun, and determiner

(if present)– Capitalization: whole word or only first letter– Number: whether NP is a number or starts with a number

• Distance-based– Word distance: number of in-between words– Sentence distance: number of in-between sentences– Mention distance: number of in-between mentions

• Syntax-based– Part-of-speech tag of the mention head

• Pronoun– Gender: masculine, femine, neuter– Number: singular, plural– Personal: first, second, third– Possessive– Reflexive


25

RoadmapRoadmap



– CEAF




26

Coreference HeuristicsCoreference Heuristics• Compatible mentions:

– Exact string match of capitalized mentions“Audi” & “Audi”

– Exact string match of mentions within a sentence“car” & “car”

– Acronyms“GM” & “General Motor”

– First person pronoun“I” & “me”

– Second person pronoun“you” & “yours”

– Third person pronoun with gender, number agreement in the same sentence“he” & “him”, “she” & “her”

– Non-numeral mentions with editing distance < 15% of the length of the mention“engine” & “eengine”, “Toyota” & “Toyoda”

• Incompatible mentions:– Different acronyms

“GM” & “BMW”– Personal, gender, number disagreement

“I” & “you”, “he” & “she”, “car” & “cars”


27

Configuration MUC-F MUC-P MUC-R B3-F B3-P B3-R

Unlinked entities -- -- 0 71.8 100 56.0

Single entity (w/ all mentions) 61.4 44.3 100 8.6 4.5 100

w/ editing distant 66.2 64.0 68.5 78.6 76.8 80.4

editing dist. at sentence level 64.8 63.7 66.0 78.7 77.8 79.4

w/o editing distant 70.0 75.8 64.9 83.0 86.8 79.5

Edit dist + cardinality match 70.4 78.7 63.7 83.7 89.2 78.8

Order of clustering (local (mentions within sentence) to global, forward, and reverse direction) did not alter our results.

Heuristic System: ResultsHeuristic System: Results


28

RoadmapRoadmap



– CEAF




29

Machine Learning ApproachMachine Learning Approach• Feature value

– Converted to binary (quantize scalar values)

• Training sample selection– Positive samples formed with pairs of consecutive coreferent

mentions, negative samples formed using any mentions between consecutive coreferent mentions

– All mention pairs within a small window (a few sentences)

• Robust Risk Minimization (Generalized Winnow)– Linear classifier– Multiplicative weight update (quickly discounts irrelevant

features after a few iterations)– Class margin (m) can be converted to probability:

−≤−>>

≥= +

1

11

1

0

1

2

1

m

m

m

p m


30

−+

−+

−−

++

−−

++

−

−+−+

−+

−+

−=−=

+==

−==−=

==−−−=

−−−==

===

=====

θθθ

θθθθ

ηθθ

µθθµ

θ

www

ΔaaayΔa

yΔadjyxΔaww

djyxΔawwapacΔa

yyxwwp

djwwnia

wyxyx

iii

ii

ii

iij

i

iij

i

ic

iacii

iiiT

jj

i

nn

endend

)exp(

)exp( )...1( )exp(

)...1( )exp( ))),(,2max(min(

)()( 1...nifor

1...Kkfor

)...1( )...1( 0

thresholdand ctor weight ve:ouput),(),...,,( data training:input

jj

jj

11Separate positive

and negative weights

Multiplicative weight update

Robust Risk Minimization (RRM)Robust Risk Minimization (RRM)

RRM was proposed by Tong Zhang; best CoNLL’03 chunker.


31

Input sequence: a1, b1, b2, a2… [a1]

[a1,b1] [a1][b1]

[a1,b1,b2] [a1,b1][b2] [a1,b2][b1]

[a1][b1,b2]

[a1][b1][b2]

[a1,b1,b2,a2]

[a1,b1,b2][a2]

[a1,a2] [b1,b2] [a1][b1,b2,a2] [a1][b1,b2][a2]

[a1,b1][b2][a2]

[a1,b1, a2][b2] [a1,b1][b2,a2]

[a1,b2,a2][b1]

[a1,b2][b1,a2]

[a1,b2][b1][a2]

[a1,a2][b1][b2]

[a1][b1,a2][b2]

[a1][b1][b2,a2]

[a1][b1][b2] [a2]

Bell TreeBell Tree

Lots of states in the search space to explore!


32

Bell Tree Coreference ModelBell Tree Coreference Model

Given an input sequence of mentions m1, m2..., mk, ...mn, and:

et: entity

Ek: set of partial entities containing mentions m1…mk

Ak: index of entity which the next mention should merge with

L: binary (1=link to existing entity, 0=create entity)

Define

link model (link mention to existing entity):

creation model (start a new entity):

),|1(max),|1(

),,|1(

kem

kt

kkk

mmLPmeLP

tAmELP

t

=≈=≈

==

∈

)),,|1(max1(),|0(

tAmELPmELP

kkkt

kk

==−≈=

α

Entities assumed to be independent

Link probability derived from the most probable mention pair

Tunable to encourage or penalize entity creation


33

Input sequence: a1, b1, b2, a2, c1…[a1] p = 1

[a1,b1]p=0.4

a1 b1 b2 a2 c1

a1 1

b1 0.4 1

b2 0.2 0.9 1

a2 0.8 0.1 0.3 1

c1 0.4 0.3 0.4 0.2 1

[a1][b1]p=0.6

Coreference probability:

[a1,b1,b2]p=0.36

[a1,b1][b2]p=0.04

[a1,b2][b1]p=0.12

[a1][b1,b2]p=0.54

[a1][b1][b2]p=0.06

[a1,b1,b2,a2]p=0.288

[a1,b1,b2][a2]p=0.072

[a1,a2] [b1,b2]p=0.432

[a1][b1,b2,a2]p=0.162

[a1][b1,b2][a2]p=0.108

[a1,b1,b2, a2,c1]p=0.1152

[a1,b1,b2, a2][c1]p=0.1728

[a1,a2,c1][b1, b2]p=0.1728

[a1,a2][b1, b2,c1]p=0.1728

[a1,a2][b1, b2][c1]p=0.2592

Bell Tree in ActionBell Tree in Action


34

DiscussionDiscussion

• How can coreference scoring measures be evaluated?– Consistency:

Does better score equate better human judgment of output? Do all measures score higher for one set of output over another?

– Application specific: Does better score translate to better application performance

(sentiment analysis, machine translation)?

• Techniques for picking mention-pair training samples– Cluster mention-pairs and pick minority class samples within

each cluster.


35

Future WorkFuture Work

• Coreference model– Features

Entity class type Dependency and/or semantic role features for sentence level

mentions: parse tree path, predicate, arguments– Classification

Training sample selection: select mention pairs with discriminatory features

Multi-class: classify between mentions with strong compatibility indicators and mentions with weak compatibility indicators

Algorithms: SVM, Random Forest

• Clustering– Algorithm: SVMcluster, soft CSP– Different similarity metrics


36

AcknowledgementsAcknowledgements

• Dr. Xiaoqiang LuoIBM T.J.Watson Research Center

• Prof. Martha PalmerUniv. of Colorado

• Prof. James MartinUniv. of Colorado

• Jason KesslerJ.D. Power and Associates

• Dr. Miriam EckertJ.D. Power and Associates


37

ReferencesReferencesAmit Bagga & Breck Baldwin. 1998. Algorithms for Scoring Coreference Chains. 1st International Conference on Language

Resources and Evaluation Workshop on Linguistics Coreference, pp. 563–566.

Dan Cristea & Oana Postolache. 2005. How to Deal with Wicked Anaphora. Anaphora Processing: Linguistic, Cognitive and Computational Modelling, ed. by A. Branco, T. McEnery & R. Mitkov, pp. 17-46. John Benjamins: Amsterdam & Philadelphia.

Thomas Finley & Thorsten Joachims. 2005. Supervised Clustering with Support Vector Machines. 22nd International Conference on Machine Learning (ICML’05), pp. 217–224, New York, N.Y., U.S.A. ACM.

Xiaoqiang Luo, Abe Ittycheriah, Hongyan Jing, Nanda Kambhatla & Salim Roukos. 2004. A Mention-Synchronous Coreference Resolution Algorithm Based on the Bell Tree. 42nd Annual Meeting on Association for Computational Linguistics (ACL’04), page 135, Morristown, N.J., U.S.A. ACL.

Xiaoqiang Luo. 2005. On Coreference Resolution Performance Metrics. Human Language Technology and Empirical Methods in Natural Language Processing (HLT’05), pp. 25–32, Morristown, N.J., U.S.A. ACL.

Jason S. Kessler & Nicolas Nicolov. 2009. Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations. 3rd International AAAI Conference on Weblogs and Social Media (ICWSM’09), San Jose, California, U.S.A.

Nicolas Nicolov. 2003. Book review: Anaphora Resolution” (R.Mitkov), IEEE Computational Intelligence Bulletin, Vol. 2, No. 1, pp. 31-32, June 2003.

Oana Postolache & Corina Forascu. 2004. A Coreference Model on Excerpts from a Novel. European Summer School in Logic Language and Information – ESSLLI'04, pp. 202-213. Nancy, France.

Marc Vilain, John Burger, John Aberdeen, Dennis Connolly & Lynette Hirschman. 1995. A Model-Theoretic Coreference Scoring Scheme. 6th conference on Message understanding (MUC6 ’95), pp. 45–52, Morristown, N.J., U.S.A. ACL.

Tong Zhang, Fred Damerau & David Johnson. 2002. Text Chunking Based on a Generalization of Winnow. Journal of Machine Learning Research, 2:615–637.


38

Thank you!