is this ne tagger getting old?

23
Outline Introduction Corpus Analysis NER Performance Analysis Experiments Final Remarks Is this NE tagger getting old? Language Resources and Evaluation Conference Marrakech, Morocco - May 28th - 30th 2008 Cristina Mota and Ralph Grishman IST & L2F INESC-ID (Portugal) & NYU (USA) and New York University (USA) (Advisors: Ralph Grishman & Nuno Mamede) This research was funded by Funda¸ ao para a Ciˆ encia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000) Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Upload: cmota21

Post on 04-Jul-2015

96 views

Category:

Documents


0 download

DESCRIPTION

Presentation at LREC 2008 of Cristina Mota & Ralph Grishman (2008). "Is this NE tagger getting old?". In Proc. of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (Marrakech, 28-30 May 2008).

TRANSCRIPT

Page 1: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Is this NE tagger getting old?

Language Resources and Evaluation ConferenceMarrakech, Morocco - May 28th - 30th 2008

Cristina Mota and Ralph Grishman

IST & L2F INESC-ID (Portugal) & NYU (USA)and

New York University (USA)

(Advisors: Ralph Grishman & Nuno Mamede)

This research was funded by Fundacao para a Ciencia e a Tecnologia (doctoral scholarship SFRH/BD/3237/2000)

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 2: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Outline

1 Introduction

2 Corpus Analysis

3 NER Performance Analysis

4 Experiments

5 Final Remarks

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 3: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

MotivationApproach

1 IntroductionMotivationApproach

2 Corpus Analysis

3 NER Performance Analysis

4 Experiments

5 Final Remarks

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 4: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

MotivationApproach

What is NER?

Mary is studying in Rabat at Mohammed V University� NE Tagger �

MaryPER is studying in RabatLOC at Mohammed VUniversityORG

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 5: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

MotivationApproach

The Problem

o o o o o o

oo

o o

o

o

o

o

o

o

05

1015

2025

Time frame (semester)N

ame

occu

rren

ces

/ 100

K w

ords

x

x

x

x x x x x x x x x x x x xO O

OO

O O

O

O O O

O

O

O

O

O O

xx

x

x

x x

x

x x x x x x x x x

oxOx

UECEEUnião EuropeiaComunidade Europeia

91a 92a 93a 94a 95a 96a 97a 98a

Do texts vary over time in a way that affects NE recognition?

Should NE taggers be also conceived time-aware?

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 6: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

MotivationApproach

Approach

Corpus Analysis

Measure corpus similarity based on

Words

Compute name list overlaps

By type

By token

NER Performance Analysis

Assess performance by trainingand testing with differentconfigurations (train,test)

Increase time gap betweentraining and test data

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 7: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps

1 Introduction

2 Corpus AnalysisCorpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps

3 NER Performance Analysis

4 Experiments

5 Final Remarks

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 8: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps

Corpus Similarity Algorithm (Kilgarriff, 2001)

Similarity(A,B):

Split corpus A and B into k slices each

Repeat m times:

Randomly allocate k2 slices to Ai and k

2 to Bi

Construct word frequency lists for Ai and Bi

Compute CBDF between A and B for the n most frequentwords of the joint corpus (Ai+Bi )[CBDF = χ2 by degrees of freedom]

Output mean and standard deviation of CBDF of allexperiments

Repeat using corpus A only: Similarity(A,A) → Homogeneity(A)Repeat using corpus B only: Similarity(B,B) → Homogeneity(B)

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 9: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps

Corpus Similarity Algorithm (Kilgarriff, 2001)

Corpus A

DAA′1

DAA′2

.

.

.

DAA′n

DAA′

Homogeneity(A)

12 Corpus A + 1

2 Corpus B

DAB′1

DAB′2

.

.

.

DAB′n

DAB

Similarity(A, B)

Corpus B

DBB′1

DBB′2

.

.

.

DBB′n

DBB′

Homogeneity(B)

Lower values of D ⇒ higher homogeneity/similarity

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 10: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps

Name List Overlaps

type overlap =|TA ∩ TB |

|TA| + |TB | − |TA ∩ TB |(1)

token overlap =

∑Ni=1 min(fA(i), fb(i))

∑Ni=1 max(fA(i), fB(i))

(2)

TA = list of different names (name types) of text A

fA(i) = frequency of name i in text A

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 11: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Corpus Similarity Algorithm (Kilgarriff, 2001)Name List Overlaps

Name List Overlaps

A name list: Mary (3), Rabat (5), Mohammed V University (4)B name list: John (1), Rabat (2), Mohammed V Universirty (6)

Type Overlap

|{Rabat,MohammedVUniversity}|

|{Mary ,Rabat,MohammedVUniversity , John}|= 2/4

Token Overlap

min(3, 0) + min(5, 2) + min(4, 6) + min(0, 1)

max(3, 0) + max(5, 2) + max(4, 6) + max(0, 1)= 6/15

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 12: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

NE Tagger Description (Collins & Singer, 1999)

1 Introduction

2 Corpus Analysis

3 NER Performance AnalysisNE Tagger Description (Collins & Singer, 1999)

4 Experiments

5 Final Remarks

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 13: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

NE Tagger Description (Collins & Singer, 1999)

NE Tagger Description (Collins & Singer, 1999)

Raw TEXT

POS Tagging + Parsing

Shallow Parsed TEXT

NE Identification TEXT with unclassified NE

List of Examples (NE,context)

NE Classification Name seeds

List of Labeled Examples (NE, context, label)

Text Update + NE Propagation

TEXT with classified NE

?

?

?

?-

?�

?

?�

?

?

Classification in detail:

Name Rules :- Name seeds

Label with Name Rules

Infer Contextual Rules

Label with Contextual Rules

Infer Name Rules

Label with Name + Contextual Rules

List of Labeled Examples (NE, Context, Label)

?

?

?

?

-

6

?

?

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 14: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity

1 Introduction

2 Corpus Analysis

3 NER Performance Analysis

4 ExperimentsExperimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity

5 Final RemarksCristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 15: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity

Experimental Setting

91a 92a 93a 94a 95a 96a 97a 98a

Time frame (semester)

Num

ber

of w

ords

0e+

002e

+06

4e+

066e

+06

8e+

061e

+07

CultureSportsEconomyPoliticsSociety

CETEMPublico (Santos & Rocha, 2001) is aPortuguese public journalistic corpus

Size: 180 million words

Time span: 8 years

Organization: randomly shuffled extracts[1 extract ≅ 2 paragraphs]

Classification: 10 topics and 16 timeframes (year + semester)

Mark up: paragraphs, sentences,enumeration lists and authors

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 16: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity

Experimental Setting

Topic: politics

Time unit: year

Text unit: sentence

Size: 10 slices x 60000 words per time frame

N most frequent words: 2000 words

Names compared: 82400 per time frame

Seeds (S): different names in the first 2500 name instances [first198 extracts per semester]

Test (T): next 208 extracts per semester grouped by year

Unlabeled examples (U): first 82456 names with context per year[following 7856 extracts]

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 17: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity

NER Performance: F-Measure over Time

0 1 2 3 4 5 6 7

0.79

0.80

0.81

0.82

0.83

0.84

0.85

Time gap (year)

F−

mea

sure

(%

)

When the texts are from the sameyear (time gap = 0), theF-measure ranges approximatelyfrom 82% to 85%

When the texts are 5 years apartthe F-measure ranges from about79% to 82%

As the time gap between (Sk , Uk)and Tj increases, the F-measureshows a tendency to decay

Training-test configuration: (Si ,Ui ,Tj ), i=91..98, j=91..98 [64 tests]

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 18: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity

Politics Corpus Dissimilarity over time

0 1 2 3 4 5 6 7

12

34

56

Time gap (year)

Dis

sim

ilarit

y (=

mea

n C

BD

F)

The homogeneity for all the textsis very close to 1

Increasing the time gap to oneyear, the dissimilarity ranges from2.5 to 4.5

At a distance of five yearsdissimilarity ranges from 4.7 toalmost 6.5

The dissimilarity shows a tendencyto increase as the time gapincreases

Corpus comparisons: (Ui ,Uj ), i=91..98, j=91..98 [64 comparisons; Higher values = Lower similarity]

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 19: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity

Politics Name List Overlap over Time

0 1 2 3 4 5 6 7

4.0

4.5

5.0

5.5

6.0

Time gap (year)

Nam

e ty

pe o

verla

p (%

)

0 1 2 3 4 5 6 7

1.7

1.8

1.9

2.0

2.1

2.2

Time gap (year)

Nam

e to

ken

over

lap

(%)

Within the same time frame, the type overlap varies between 5% and 6%

At a distance of 5 years it varies between 3.5% and 4.5%

Within the same year, the name token overlap varied between 4.2% and 4.4%

At distance of 5 years varied between 3.2% and 3.7%

Overlap between name lists also decreases over time

Corpus comparisons: (Ui ,Tj ), i=91..98, j=91..98 [64 comparisons]

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 20: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Experimental SettingF-Measure over TimePolitics Dissimilarity over TimePolitics Name List Overlap over TimeF-Measure compared to Dissimilarity

F-Measure compared to Dissimilarity

1 2 3 4 5 6

0.79

0.80

0.81

0.82

0.83

0.84

0.85

Dissimilarity (= mean CBDF)

F−

mea

sure

(%

)

There is an inverse associationbetween dissimilarity andF-measure: for higher levels ofdissimilarity (i.e, higher distancevalues) we obtain lowerperformance values

OBS: Higher values = Lower similarity

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 21: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Main ResultsWork in Progress

1 Introduction

2 Corpus Analysis

3 NER Performance Analysis

4 Experiments

5 Final RemarksMain ResultsWork in Progress

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 22: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Main ResultsWork in Progress

Main Results

Within a period of 8 years we observed that:

Corpus similarity and name overlaps tend to decrease as thetwo corpora become more temporally distant

The performance of a co-training based NE tagger trained andtested on those texts shows a decay as we increase the timegap between the training and the test data

There is an association between the results of the corpusanalysis and the tagger performance

Cristina Mota and Ralph Grishman Is this NE tagger getting old?

Page 23: Is this NE tagger getting old?

OutlineIntroduction

Corpus AnalysisNER Performance Analysis

ExperimentsFinal Remarks

Main ResultsWork in Progress

Work in Progress

Other related issues we are currently investigating aiming at betternamed entity recognition

Analyze the NE surrounding contexts to verify if they alsotend to overlap less over time

Investigate how we can avoid the performance decay

Do we need more data?Do we need more labeled data within the same time frame?Do we need more unlabeled data within the same time frame?

Cristina Mota and Ralph Grishman Is this NE tagger getting old?