nerd: evaluating named entity recognition tools in the web of data

21
NERD: Evaluating Named Entity Recognition Tools in the Web of Data Giuseppe Rizzo <[email protected] > Raphaël Troncy <[email protected] >

Upload: giuseppe-rizzo

Post on 10-May-2015

1.694 views

Category:

Technology


0 download

DESCRIPTION

Talk "NERD: Evaluating Named Entity Recognition Tools in the Web of Data" event during WEKEX'11 workshop (ISWC'11), Bonn, Germany

TRANSCRIPT

Page 1: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

NERD: Evaluating Named Entity Recognition Tools in the Web of Data

Giuseppe Rizzo <[email protected]>Raphaël Troncy <[email protected]>

Page 2: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 2/21

What is a Named Entity recognition task?

A task that aims to locate and classify the name of a person or an organization, a location, a brand, a product, a numeric expression including time, date, money and percent in a textual document

Page 3: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 3/21

Named Entity recognition tools

Page 4: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 4/21

Differences among those NER extractors

Granularity extract NE from sentences vs from the entire document

Technologies used algorithms used to extract NE supported languages taxonomy of type of NE recognized disambiguation (dataset used to provide links) content request size Response format

Page 5: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 5/21

And ...

What about precision and recall? Which extractor best fits my needs?

Page 6: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 6/21

Seeks to find pros and cons of those extractors

ontology3REST API1

UI2

What is NERD?

1 http://nerd.eurecom.fr/api/application.wadl2 http://nerd.eurecom.fr/3 http://nerd.eurecom.fr/ontology

Page 7: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 7/21

Showcase

http://nerd.eurecom.fr

Science: "Google Cars Drive Themselves", http://bit.ly/oTj8md (part of the original resource found at http://nyti.ms/9p19i8)

Page 8: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 8/21

Evaluation

Controlled experiment 4 human raters 10 English news articles (5 from BBC and 5 from The New York Times) each rater evaluated each article for all the extractors 200 evaluations in total

Uncontrolled experiment 17 human raters 53 English news articles (sources: CNN, BBC, The New York Times and Yahoo! News) free selection of articles

5 extractors using default configurations

Each human rater received a training1

1 http://nerd.eurecom.fr/help

Page 9: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 9/21

Evaluation output

The assessment consists in rating these criteria with a Boolean value

If no type or no disambiguation URI is provided by the extractor, it is considered false by default

t = (NE, type, URI, relevant)

Page 10: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 10/21

Controlled experiment - dataset1

Categories: World, Business, Sport, Science, Health

1 BBC article and 1 NYT article for each category

Average word number per article: 981

The final number of unique entities detected is 4641 with an average number of named entity per article equal to 23.2

Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE duplicates. We removed all duplicates do not bias the statistics

1 http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz

Page 11: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 11/21

Controlled experiment – agreement score

Grouped by extractor

Grouped by source

Fleiss's kappa score1

1 Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 1971

Grouped by category

Page 12: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 12/21

Controlled experiment – statistic result

Overall statistics

Grouped by extractor

Grouped by category

different behavior for different sources

Page 13: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 13/21

Uncontrolled experiment - dataset

17 raters were free to select English news articles from CNN, BBC, The New York Times and Yahoo! News

53 news articles selected

Total number of assessments = 94 and the assessment average number per user = 5.2

Each article assessed at least by 2 different tools

The final number of unique entities detected is 1616 with an average number of named entity per article equal to 34

Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE duplicates. In order do not bias the statistics, we removed all duplicates

Page 14: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 14/21

Uncontrolled experiment – statistic result (I)

Overall precision

Grouped by extractors

Page 15: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 15/21

Grouped by category

Uncontrolled experiment – statistic result (II)

Page 16: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 16/21

ConclusionQ. Which are the best NER tools ?A. They are ...

AlchemyAPI has obtained the best results in NE extraction and categorization

DBpedia Spotlight and Zemanta showed ability to disambiguate NE in the LOD cloud

Experiments across categories of articles did not show significant differences in the analysis.

Published the WEKEX'11 ground-truthhttp://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz

Page 17: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 17/21

Future Work (NERD Timeline)

beginning

today

uncontrolled experiment

core application

controlled experiment

REST API, release WEKEX'11 ground-truth

release ISWC'11 ground truth

NERD “smart” service: combining the best of all NER tools

Page 18: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 18/21

ISWC'11 golden-set

Do you believe it's easy to find an agreement among all raters?

We'd like inviting to create a new golden-set during the ISWC'2011 poster and demo session. We will kindly ask each rater to evaluate two short parts of two English news articles with all extractors supported by NERD

Page 19: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 19/21

http://nerd.eurecom.fr

http://www.slideshare.net/giusepperizzo

Thanks for your time and your attention

@giusepperizzo @rtroncy #nerd

Page 20: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 20/21

Fleiss ' Kappa

chance agreement

K = 1 fully agreement among all raters

K = 0 (or lesser than) poor agreement

Page 21: NERD: Evaluating Named Entity Recognition Tools in the Web of Data

24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 21/21

Fleiss ' kappa Interpretation

Kappa Interpretation

< 0 Poor agreement

0.01 – 0.20 Slight agreement

0.21 – 0.40 Fair agreement

0.41 – 0.60 Moderate agreement

0.61 – 0.80 Substantial agreement

0.81 – 1.00 Almost perfect agreement