nlp: an information extraction perspective ralph grishman september 2005 nyu

NLP:An Information Extraction Perspective

Ralph GrishmanSeptember 2005

Information Extraction(for this talk)Information Extraction (IE) = identifying the instances of the important relations and events for a domain from unstructured text.

Extraction ExampleTopic: executive successionGeorge Garrick, 40 years old, president of the London-based European Information Services Inc., was appointed chief executive officer of Nielsen Marketing Research, USA.

George Garrick, 40 years old,Nielsen Marketing Research, USA.

Position

Company

Location

Person

Status

President

European Information Services, Inc.

London

George Garrick

Out

CEO

Nielsen Marketing Research

USA

George Garrick

In

Why an IE Perspective?IE can use a wide range of technologies:some successes with simple methods (names, some relations)high performance IE will need to draw on a wide range of NLP methodsultimately, everything needed for deep understandingPotential impact of high-performance IEA central perspective of our NLP laboratory

Progress and FrustrationOver the past decadeIntroduction of machine learning methods has allowed a shift from hand-crafted rules to corpus-trained systemsshifted burden to annotation of lots of data for a new taskBut has not produced large gains in bottom-line performanceglass ceiling on event extraction performance can the latest advances give us a push in performance and portability?

Pattern MatchingRoughly speaking, IE systems are pattern-matching systems:we write a pattern corresponding to a type of event we are looking forx shot ywe match it against the textBooth shot Lincoln at Fords Theatreand we fill a data base entryshooting eventsassailanttarget BoothLincoln

Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.performanceportabilityfuzzy boundaries

Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.

Identifying linguistic expressionsTo be at all useful, the patterns for IE must be stated structurallypatterns at the token level are not general enoughSo our main obstacle (as for many NLP tasks) is accurate structural analysisname recognition and classificationsyntactic structureco-reference structureif the analysis is wrong, the pattern wont match

Decomposing Structural AnalysisDecomposing structural analysis into subtasks like named entities, syntactic structure, coreference has clear benefits problems can be addressed separatelycan build separate corpus-trained modelscan achieve fairly good levels of performance (near 90%) separatelywell, maybe not for coreferenceBut it also has problems ...

Sequential IE Framework10090%80%70%Errors are compounded from stage to stage

RawDocName/NominalMentionTagger

ReferenceResolver

RelationTagger

Analyzed Doc.Precision:

A More Global ViewTypical pipeline approach performs local optimization of each stageWe can take advantage of interactions between stages by taking a more global view of best analysisFor example, prefer named entity analyses which allow for more coreference or more semantic relations

Names which can be coreferenced are much more likely to be correctCounting only difficult names for name tagger small margin over 2nd hypothesis, not on list of common names

Names which can participate in semantic relations are much more likely to be correct

Chart1

66.411

66.514.3

67.315.3

75.719

78.823.1

81.328.9

84.134.7

86.942.2

8950.1

90.755.3

Participating in Relation

Not Participating in Relation

Threshold of Margin (difference between the logprobabilities of the first and second name hypotheses)

Name Accuracy

Probability of a name being correct & margin lower than threshold

Sheet1

0.266.411

0.566.514.3

0.867.315.3

175.719

1.278.823.1

1.581.328.9

1.884.134.7

286.942.2

38950.1

490.755.3

Sheet1

66.411

66.514.3

67.315.3

75.719

78.823.1

81.328.9

84.134.7

86.942.2

8950.1

90.755.3

In Relation

Not in Relation

Margin Threshold

Accuracy(%)

Probability of a name being correct & margin lower than threshold

Sheet2

Sheet3

Sources of interactionCoreference and semantic relations impose type constraints (or preferences) on their arguments

A natural discourse is more likely to be cohesive to have mentions (noun phrases) which are linked by coreference and semantic relations

N-bestOne way to capture such global information is to use an N-best pipeline and rerank after each stage, using the additional information provided by that stage(Ji and Grishman ACL 2005 )

Reduced name tagging errors for Chinese by 20% (F measure: 87.5 --> 89.9)

Multiple Hypotheses + Re-Ranking Re-Ranking ModelCombination of information fromInteractions between stages

RawDocName/NominalMentionTagger

ReferenceResolver

RelationTagger100%99%98%97%85%120top11NameCorefRelationMaximumPrecision:prunedprunedprunedFinalPrecision

Computing Global ProbabilitiesRoth and Yih (CoNLL 2004) optimized a combined probability over two analysis stageslimited interaction to name classification and semantic relation identificationoptimized product of name and relation probabilities, subject to constraint on types of name argumentsused linear programming methodsobtained 1%+ improvement in name tagging, and 2-4% in relation tagging, over conventional pipeline

Lots of Ways of Expressing an EventBooth assassinated LincolnLincoln was assassinated by BoothThe assassination of Lincoln by BoothBooth went through with the assassination of LincolnBooth murdered LincolnBooth fatally shot Lincoln

Syntactic ParaphrasesSome paraphrase relations involve the same words (or morphologically related words) and are broadly applicableBooth assassinated LincolnLincoln was assassinated by BoothThe assassination of Lincoln by BoothBooth went through with the assassination of LincolnThese are syntactic paraphrases

Semantic ParaphrasesOthers paraphrase relations involve different word choices:Booth assassinated LincolnBooth murdered LincolnBooth fatally shot LincolnThese are semantic paraphrases

Attacking Syntactic ParaphrasesSyntactic paraphrases can be addressed through deeper syntactic representations which reduce paraphrases to a common relationship:chunkssurface syntaxdeep structure (logical subject/object)predicate-argument structure (semantic roles)

Tree BanksSyntactic analyzers have been effectively created through training from tree banksgood coverage possible with a limited corpus

Predicate Argument BanksThe next stage of syntactic analysis is being enabled through the creation of predicate-argument banksPropBank (for verb arguments) (Kingsbury and Palmer [Univ. of Penn.])NomBank (for noun arguments)*(Meyers et al. )

* first release next week

PA Banks, contdTogether these predicate-argument banks assign common argument labels to a wide range of constructs

The Bulgarians attacked the TurksThe Bulgarians attack on the TurksThe Bulgarians launched an attack on the Turks

Depth vs. AccuracyPatterns based on deeper representations cover more examplesbutDeeper representations are generally less accurateLeaves us with a dilemma to use shallow (chunk) or deep (PA) patterns

Resolving the DilemmaThe solution: allow patterns at multiple levelscombine evidence from the different levelsuse machine learning methods to assign appropriate weights to each level

In cases where deep analysis fails, correct decision can often be made from shallow analysis

Integrating Multiple LevelsZhao applied this approach to relation and event detectioncorpus-trained methoda kernel measures similarity of an example in the training corpus with a test inputseparate kernels atword levelchunk levellogical syntactic structure levela composite kernel combines information at different levels

Kernel-based Integration

PreprocessingPost-processingSVM / KNNResultsLogical RelationsNameTaggerSentParserPOSTaggerOther Analyzer

Benefits of Level IntegrationZhao demonstrated significant performance improvements for semantic relation detection by combining word, chunklogical syntactic relationsover performance of individual levels(Zhao and Grishman ACL 2005 )

Attacking Semantic ParaphraseSome semantic paraphrase can be addressed through manually prepared synonym sets, such as are available in WordNetStevenson and Greenwood [Sheffield] (ACL 2005) measured the degree to which IE patterns could be successfully generalized using WordNetmeasured on executive succession taskstarted with a small seed set of patterns

Seed Pattern Set for Executive Succession

v-appoint = { appoint, elect, promote, name }v-resign = { resign, depart, quit}

Subject

Verb

Object

company

v-appoint

person

person

v-resign

-

Evaluating IE Patterns

Text filtering metric: if we select documents / sentences containing a pattern, how many of the relevant documents / sentences do we get?

Wordnet worked quite well for the executive succession task

seed expanded P R P Rdocument filtering100%26%68%96%sentence filtering81%10%47%64%

Challenge of Semantic ParaphraseBut semantic paraphrase, by its nature, is more open ended and more domain-specific than syntactic paraphrase, so it is hard to prepare any comprehensive resource by handCorpus-based discovery methods will be essential to improve our coverage

Paraphrase discoveryBasic Intuition:find pairs of passages which probably convey the same informationalign structures at points of known correspondence (e.g., names which appear in both passages)Fred xxxxx HarrietFred yyyyy Harriet

similar to MT training from bitextsparaphrases

Evidence of paraphraseFrom almost parallel text: strong external evidence of paraphrase + a single aligned exampleFrom comparable text: weak external evidence of paraphrase + a few aligned examplesFrom general text: using lots of aligned examples

Paraphrase from Translations(Barzilay and McKeown ACL 01 [Columbia])Take multiple translations of same novel.High likelihood of passage paraphraseAlign sentences.Chunk and align sentence constituents

Found lots of lexical paraphrases (words & phrases); a few larger (syntactic) paraphrasesData availability limited

Paraphrase from news sources(Shinyama, Sekine, et al. IWP 03 )Take news stories from multiple sources from same dayUse word-based metric to identify stories about same topicTag sentences for names; look for sentences in the two stories with several names in commonmoderate likelihood of sentence paraphraseLook for syntactic structures in these sentences which share namessharing 2 names, paraphrase precision 62% (articles about murder in Japanese)sharing one name, at least four examples of a given paraphrase relation, precision 58% (2005 results, English, no topic constraint)

Relation paraphrase from multiple examplesBasic idea:Ifexpression R appears with several pairs of namesa R b, c R d, e R f, expression S appears with several of the same pairsa S b, e S f, Then there is a good chance that R and S are paraphrases

Relation paraphrase -- exampleEastern Groups agreement to buy HansonEastern Groupto acquire Hanson

CBS will acquire WestinghouseCBSs purchase of WestinghouseCBS agreed to buy Westinghouse

(example based on Sekine 2005)



select main linking predicate



2 shared pairsparaphrase link (buy acquire)

Relation paraphrase, contdBrin (1998); Agichtein and Gravano (2000):acquired individual relations (authorship, location)Lin and Pantel (2001)patterns for use in QASekine (IWP 2005 )acquire all relations between two types of namesparaphrase precision 86% for person-company pairs, 73% for company-company pairs

Topic

Set of documents on topic

Set of patterns characterizing topic

Riloff MetricDivide corpus into relevant (on-topic) and irrelevant (off-topic) documentsClassify (some) words into major semantic categories (people, organizations, )Identify predication structures in document (such as verb-object pairs)Count frequency of each structure in relevant (R) and irrelevant (I) documentsScore structures by (R/I) log RSelect top-ranked patterns

BootstrappingGoal: find examples / patterns relevant to a given topic without any corpus tagging (Yangarber 00 )Method:identify a few seed patterns for topicretrieve documents containing patternsfind additional structures with high Riloff metricadd to seed and repeat

#1: pick seed patternSeed: < person retires >

#2: retrieve relevant documentsSeed: < person retires >Fred retired.... Harry was named president.Maki retired.... Yuki was named president.Relevant documentsOther documents

#3: pick new patternSeed: < person retires >

< person was named president > appears in several relevant documents (top-ranked by Riloff metric)Fred retired.... Harry was named president.Maki retired.... Yuki was named president.

#4: add new pattern to pattern setPattern set: < person retires > < person was named president >

Applied to Executive Succession task

v-appoint = { appoint, elect, promote, name }v-resign = { resign, depart, quit, step-down } Run discovery procedure for 80 iterationsseed

Subject

Verb

Object

company

v-appoint

person

person

v-resign

-

Evaluation: Text FilteringEvaluated using document-level text filtering

Comparable to WordNet-based expansionSuccessful for a variety of extraction tasks

Pattern set

Recall

Precision

Seed

11%

93%

Seed+discovered

88%

81%

Document Recall / Precision

Evaluation: Slot fillingHow effective are patterns within a complete IE system?MUC-style IE on MUC-6 corpora

Caveat: filtered / aligned by hand

manualMUC54716247 70 56manualnow697974 56 75 642774 40527260

Topical Patterns vs. ParaphrasesThese methods gather the main expressions about a particular topicThese include sets of paraphrasesname, appoint, selectBut also include topically related phrases which are not paraphrasesappoint & resignshoot & die

Pattern Discovery + Paraphrase DiscoveryWe can couple topical pattern discovery and paraphrase discoveryfirst discover patterns from topic description (Sudo )then group them into paraphrase sets (Shinyama )Result are semantically coherent extraction pattern groups (Shinyama 2002)although not all patterns are groupedparaphrase detection works better because patterns are already semantically related

Paraphrase identification for discovered patterns (Shinyama et al 2002)worked well for executive succession task (in Japanese): precision 94%, coverage 47%coverage = number of paraphrase pairs discovered / number of pairs required to link all paraphrasesdidnt work as well for arrest task fewer names, multiple sentences with same name led to alignment errors

ConclusionCurrent basic research on NLP methods offers significant opportunities for improved IE performance and portabilityglobal optimization to improve analysis performancericher treebanks to support greater coverage of syntactic paraphrasecorpus-based discovery methods to support greater coverage of semantic paraphrase

nlp: an information extraction perspective ralph grishman september 2005 nyu

Documents

event extraction performance

important relations

linguistic patterns

wide range of nlp

extraction exampletopic

talkinformation extraction

information extractionfor

nlp tasks