named entity recognition - acl 2011 presentation
Post on 25-May-2015
1.307 Views
Preview:
DESCRIPTION
TRANSCRIPT
The Web is not a PERSON, Berners-Lee is not an ORGANIZATION, and African-Americans are not LOCATIONS: An Analysis of the Performance of Named-Entity RecognitionRobert Krovetz (Lexicalresearch.com), Paul Deane, Nitin Madnani (ETS)
A Review by Richard Littauer (UdS)
The BackgroundNamed-Entity Recognition (NER)
is normally judged in the context of Information Extraction (IE)
The BackgroundNamed-Entity Recognition (NER)
is normally judged in the context of Information Extraction (IE)
Various competitions
The BackgroundNamed-Entity Recognition (NER)
is normally judged in the context of Information Extraction (IE)
Various competitionsRecently:
◦non-English languages◦improving unsupervised learning
methods
The Background“There are no well-established
standards for evaluation of NER.”
The Background“There are no well-established
standards for evaluation of NER.”◦Criteria for NER system changes for
competitions◦Proprietary software
The BackgroundKDM wanted to identify MWEs…
The BackgroundKDM wanted to identify MWEs…
… but false positives, tagging inconsistencies stopped this.
The BackgroundKDM wanted to identify MWEs…
… but false positives, tagging inconsistencies stopped this.
IE derives Recall and Precision from Information Retrieval
NER is just a small part of this, so is rarely evaluated independently
The BackgroundSo, they want to test NER
systems, and provide a unit test based on the problems encountered
Evaluation
Compared three NER taggers: Stanford:
◦CRF, 100m training corpus;University of Illinois (LBJ):
◦Regularized average perceptron, Reuters 1996 News Corpus;
BBN IdentiFinder (IdentiFinder):◦HMMs, commercial
EvaluationAgreement on Classification
EvaluationAgreement on ClassificationAmbiguity in Discourse
EvaluationAgreement on ClassificationAmbiguity in Discourse
Stanford vs. LBJ on internal ETS 425m corpus
All three on American National Corpus
Stanford vs. LBJNER reported as 85-95%
accurate.
Stanford vs. LBJNER reported as 85-95%
accurate.Same number for both: 1.95m for
Stanford, 1.8m for LBJ (7.6% difference)
However, errors:
Stanford vs. LBJAgreement:
Stanford vs. LBJAmbiguity:
Stanford vs. LBJ vs. IdentiFinderAgreement:
Stanford vs. LBJ vs. IdentiFinderAgreement:
Stanford vs. LBJ vs. IdentiFinderDifferences:
◦How they are tokenized◦Number of entities recognized
overall
Stanford vs. LBJ vs. IdentiFinderAmbiguity:
Unit TestCreated two documents that can
be used as texts◦Different cases for true positives of
PERSON, LOCATION, ORGANIZATION◦Entirely upper case not NE (Ex.
AAARGH)◦Punctuated terms not NE◦Terms with Initials◦Acronyms (some expanded, some
not)◦Last names in close proximity to first
names
Unit TestCreated two documents that can
be used as texts◦Terms with prepositions (Mass. Inst.
Of Tech.)◦Terms with location and organization
(Amherst College)
Provided freely online.
One NE Tag per DiscourseUnusual for multiple occurrences
of a token in a document to be different entities
True for homonymsAn exception: Location + sports
team
One NE Tag per DiscourseStanford, LBJ have features for
non-local dependencies to help with this.
KDM: Two other uses for NLD:◦Source of error in evaluation◦A way to identify semantically
related entities
These should be treated as exceptions
DiscussionThere are guidelines for NER –
but we need standards.The community should focus on
PERSON, ORGANISATION, LOCATION, and MISC.◦Harder to deal with than Dates,
Times.◦Disagreement between taggers.◦MISC is necessary.◦These have important value
elsewhere.
DiscussionTo improve intrinsic evaluation
for NER:1. Create test sets for divers domains.2. Use standardized sets for different
phenomena.3. Report accuracy for POL separately.4. Establish uncertainty in the tagging
system.
Conclusion90% accuracy not real. We need to use only entities that
are agreed on by multiple taggers.
Even in cases where they both disagree (Hint: Future work.)
Unit test downloadable.
Cheers/PERSON
Richard/ORGANISATION thanks the Mword Class/LOCATION for listening to his talk about Berners-Lee/MISC
top related