nerd: evaluating named entity recognition tools in the web of data
DESCRIPTION
Talk "NERD: Evaluating Named Entity Recognition Tools in the Web of Data" event during WEKEX'11 workshop (ISWC'11), Bonn, GermanyTRANSCRIPT
![Page 1: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/1.jpg)
NERD: Evaluating Named Entity Recognition Tools in the Web of Data
Giuseppe Rizzo <[email protected]>Raphaël Troncy <[email protected]>
![Page 2: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/2.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 2/21
What is a Named Entity recognition task?
A task that aims to locate and classify the name of a person or an organization, a location, a brand, a product, a numeric expression including time, date, money and percent in a textual document
![Page 3: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/3.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 3/21
Named Entity recognition tools
![Page 4: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/4.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 4/21
Differences among those NER extractors
Granularity extract NE from sentences vs from the entire document
Technologies used algorithms used to extract NE supported languages taxonomy of type of NE recognized disambiguation (dataset used to provide links) content request size Response format
![Page 5: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/5.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 5/21
And ...
What about precision and recall? Which extractor best fits my needs?
![Page 6: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/6.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 6/21
Seeks to find pros and cons of those extractors
ontology3REST API1
UI2
What is NERD?
1 http://nerd.eurecom.fr/api/application.wadl2 http://nerd.eurecom.fr/3 http://nerd.eurecom.fr/ontology
![Page 7: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/7.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 7/21
Showcase
http://nerd.eurecom.fr
Science: "Google Cars Drive Themselves", http://bit.ly/oTj8md (part of the original resource found at http://nyti.ms/9p19i8)
![Page 8: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/8.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 8/21
Evaluation
Controlled experiment 4 human raters 10 English news articles (5 from BBC and 5 from The New York Times) each rater evaluated each article for all the extractors 200 evaluations in total
Uncontrolled experiment 17 human raters 53 English news articles (sources: CNN, BBC, The New York Times and Yahoo! News) free selection of articles
5 extractors using default configurations
Each human rater received a training1
1 http://nerd.eurecom.fr/help
![Page 9: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/9.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 9/21
Evaluation output
The assessment consists in rating these criteria with a Boolean value
If no type or no disambiguation URI is provided by the extractor, it is considered false by default
t = (NE, type, URI, relevant)
![Page 10: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/10.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 10/21
Controlled experiment - dataset1
Categories: World, Business, Sport, Science, Health
1 BBC article and 1 NYT article for each category
Average word number per article: 981
The final number of unique entities detected is 4641 with an average number of named entity per article equal to 23.2
Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE duplicates. We removed all duplicates do not bias the statistics
1 http://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz
![Page 11: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/11.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 11/21
Controlled experiment – agreement score
Grouped by extractor
Grouped by source
Fleiss's kappa score1
1 Joseph L. Fleiss. Measuring nominal scale agreement among many raters. Psychological Bulletin, 76(5):378–382, 1971
Grouped by category
![Page 12: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/12.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 12/21
Controlled experiment – statistic result
Overall statistics
Grouped by extractor
Grouped by category
different behavior for different sources
![Page 13: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/13.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 13/21
Uncontrolled experiment - dataset
17 raters were free to select English news articles from CNN, BBC, The New York Times and Yahoo! News
53 news articles selected
Total number of assessments = 94 and the assessment average number per user = 5.2
Each article assessed at least by 2 different tools
The final number of unique entities detected is 1616 with an average number of named entity per article equal to 34
Some of the extractors (e.g. DBpedia Spotlight and Extractiv) provide NE duplicates. In order do not bias the statistics, we removed all duplicates
![Page 14: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/14.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 14/21
Uncontrolled experiment – statistic result (I)
Overall precision
Grouped by extractors
![Page 15: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/15.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 15/21
Grouped by category
Uncontrolled experiment – statistic result (II)
![Page 16: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/16.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 16/21
ConclusionQ. Which are the best NER tools ?A. They are ...
AlchemyAPI has obtained the best results in NE extraction and categorization
DBpedia Spotlight and Zemanta showed ability to disambiguate NE in the LOD cloud
Experiments across categories of articles did not show significant differences in the analysis.
Published the WEKEX'11 ground-truthhttp://nerd.eurecom.fr/ui/evaluation/wekex2011-goldenset.tar.gz
![Page 17: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/17.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 17/21
Future Work (NERD Timeline)
beginning
today
uncontrolled experiment
core application
controlled experiment
REST API, release WEKEX'11 ground-truth
release ISWC'11 ground truth
NERD “smart” service: combining the best of all NER tools
![Page 18: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/18.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 18/21
ISWC'11 golden-set
Do you believe it's easy to find an agreement among all raters?
We'd like inviting to create a new golden-set during the ISWC'2011 poster and demo session. We will kindly ask each rater to evaluate two short parts of two English news articles with all extractors supported by NERD
![Page 19: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/19.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 19/21
http://nerd.eurecom.fr
http://www.slideshare.net/giusepperizzo
Thanks for your time and your attention
@giusepperizzo @rtroncy #nerd
![Page 20: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/20.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 20/21
Fleiss ' Kappa
chance agreement
K = 1 fully agreement among all raters
K = 0 (or lesser than) poor agreement
![Page 21: NERD: Evaluating Named Entity Recognition Tools in the Web of Data](https://reader033.vdocuments.site/reader033/viewer/2022052822/554e905eb4c90526358b4d8f/html5/thumbnails/21.jpg)
24 October 2011 Workshop on Web Scale Knowledge Extraction (WEKEX'11) - 21/21
Fleiss ' kappa Interpretation
Kappa Interpretation
< 0 Poor agreement
0.01 – 0.20 Slight agreement
0.21 – 0.40 Fair agreement
0.41 – 0.60 Moderate agreement
0.61 – 0.80 Substantial agreement
0.81 – 1.00 Almost perfect agreement