nlp: an information extraction perspective ralph grishman september 2005 nyu

Download NLP: An Information Extraction Perspective Ralph Grishman September 2005 NYU

If you can't read please download the document

Upload: theresa-day

Post on 02-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

  • NLP:An Information Extraction Perspective

    Ralph GrishmanSeptember 2005

  • Information Extraction(for this talk)Information Extraction (IE) = identifying the instances of the important relations and events for a domain from unstructured text.

  • Extraction ExampleTopic: executive successionGeorge Garrick, 40 years old, president of the London-based European Information Services Inc., was appointed chief executive officer of Nielsen Marketing Research, USA.

    George Garrick, 40 years old,Nielsen Marketing Research, USA.

    Position

    Company

    Location

    Person

    Status

    President

    European Information Services, Inc.

    London

    George Garrick

    Out

    CEO

    Nielsen Marketing Research

    USA

    George Garrick

    In

  • Why an IE Perspective?IE can use a wide range of technologies:some successes with simple methods (names, some relations)high performance IE will need to draw on a wide range of NLP methodsultimately, everything needed for deep understandingPotential impact of high-performance IEA central perspective of our NLP laboratory

  • Progress and FrustrationOver the past decadeIntroduction of machine learning methods has allowed a shift from hand-crafted rules to corpus-trained systemsshifted burden to annotation of lots of data for a new taskBut has not produced large gains in bottom-line performanceglass ceiling on event extraction performance can the latest advances give us a push in performance and portability?

  • Pattern MatchingRoughly speaking, IE systems are pattern-matching systems:we write a pattern corresponding to a type of event we are looking forx shot ywe match it against the textBooth shot Lincoln at Fords Theatreand we fill a data base entryshooting eventsassailanttarget BoothLincoln

  • Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.performanceportabilityfuzzy boundaries

  • Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.

  • Identifying linguistic expressionsTo be at all useful, the patterns for IE must be stated structurallypatterns at the token level are not general enoughSo our main obstacle (as for many NLP tasks) is accurate structural analysisname recognition and classificationsyntactic structureco-reference structureif the analysis is wrong, the pattern wont match

  • Decomposing Structural AnalysisDecomposing structural analysis into subtasks like named entities, syntactic structure, coreference has clear benefits problems can be addressed separatelycan build separate corpus-trained modelscan achieve fairly good levels of performance (near 90%) separatelywell, maybe not for coreferenceBut it also has problems ...

  • Sequential IE Framework10090%80%70%Errors are compounded from stage to stage

    RawDocName/NominalMentionTagger

    ReferenceResolver

    RelationTagger

    Analyzed Doc.Precision:

  • A More Global ViewTypical pipeline approach performs local optimization of each stageWe can take advantage of interactions between stages by taking a more global view of best analysisFor example, prefer named entity analyses which allow for more coreference or more semantic relations

  • Names which can be coreferenced are much more likely to be correctCounting only difficult names for name tagger small margin over 2nd hypothesis, not on list of common names

  • Names which can participate in semantic relations are much more likely to be correct

    Chart1

    66.411

    66.514.3

    67.315.3

    75.719

    78.823.1

    81.328.9

    84.134.7

    86.942.2

    8950.1

    90.755.3

    Participating in Relation

    Not Participating in Relation

    Threshold of Margin (difference between the logprobabilities of the first and second name hypotheses)

    Name Accuracy

    Probability of a name being correct & margin lower than threshold

    Sheet1

    0.266.411

    0.566.514.3

    0.867.315.3

    175.719

    1.278.823.1

    1.581.328.9

    1.884.134.7

    286.942.2

    38950.1

    490.755.3

    Sheet1

    66.411

    66.514.3

    67.315.3

    75.719

    78.823.1

    81.328.9

    84.134.7

    86.942.2

    8950.1

    90.755.3

    In Relation

    Not in Relation

    Margin Threshold

    Accuracy(%)

    Probability of a name being correct & margin lower than threshold

    Sheet2

    Sheet3

  • Sources of interactionCoreference and semantic relations impose type constraints (or preferences) on their arguments

    A natural discourse is more likely to be cohesive to have mentions (noun phrases) which are linked by coreference and semantic relations

  • N-bestOne way to capture such global information is to use an N-best pipeline and rerank after each stage, using the additional information provided by that stage(Ji and Grishman ACL 2005 )

    Reduced name tagging errors for Chinese by 20% (F measure: 87.5 --> 89.9)

  • Multiple Hypotheses + Re-Ranking Re-Ranking ModelCombination of information fromInteractions between stages

    RawDocName/NominalMentionTagger

    ReferenceResolver

    RelationTagger100%99%98%97%85%120top11NameCorefRelationMaximumPrecision:prunedprunedprunedFinalPrecision

  • Computing Global ProbabilitiesRoth and Yih (CoNLL 2004) optimized a combined probability over two analysis stageslimited interaction to name classification and semantic relation identificationoptimized product of name and relation probabilities, subject to constraint on types of name argumentsused linear programming methodsobtained 1%+ improvement in name tagging, and 2-4% in relation tagging, over conventional pipeline

  • Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.

  • Lots of Ways of Expressing an EventBooth assassinated LincolnLincoln was assassinated by BoothThe assassination of Lincoln by BoothBooth went through with the assassination of LincolnBooth murdered LincolnBooth fatally shot Lincoln

  • Syntactic ParaphrasesSome paraphrase relations involve the same words (or morphologically related words) and are broadly applicableBooth assassinated LincolnLincoln was assassinated by BoothThe assassination of Lincoln by BoothBooth went through with the assassination of LincolnThese are syntactic paraphrases

  • Semantic ParaphrasesOthers paraphrase relations involve different word choices:Booth assassinated LincolnBooth murdered LincolnBooth fatally shot LincolnThese are semantic paraphrases

  • Attacking Syntactic ParaphrasesSyntactic paraphrases can be addressed through deeper syntactic representations which reduce paraphrases to a common relationship:chunkssurface syntaxdeep structure (logical subject/object)predicate-argument structure (semantic roles)

  • Tree BanksSyntactic analyzers have been effectively created through training from tree banksgood coverage possible with a limited corpus

  • Predicate Argument BanksThe next stage of syntactic analysis is being enabled through the creation of predicate-argument banksPropBank (for verb arguments) (Kingsbury and Palmer [Univ. of Penn.])NomBank (for noun arguments)*(Meyers et al. )

    * first release next week

  • PA Banks, contdTogether these predicate-argument banks assign common argument labels to a wide range of constructs

    The Bulgarians attacked the TurksThe Bulgarians attack on the TurksThe Bulgarians launched an attack on the Turks

  • Depth vs. AccuracyPatterns based on deeper representations cover more examplesbutDeeper representations are generally less accurateLeaves us with a dilemma to use shallow (chunk) or deep (PA) patterns

  • Resolving the DilemmaThe solution: allow patterns at multiple levelscombine evidence from the different levelsuse machine learning methods to assign appropriate weights to each level

    In cases where deep analysis fails, correct decision can often be made from shallow analysis

  • Integrating Multiple LevelsZhao applied this approach to relation and event detectioncorpus-trained methoda kernel measures similarity of an example in the training corpus with a test inputseparate kernels atword levelchunk levellogical syntactic structure levela composite kernel combines information at different levels

  • Kernel-based Integration

    PreprocessingPost-processingSVM / KNNResultsLogical RelationsNameTaggerSentParserPOSTaggerOther Analyzer

  • Benefits of Level IntegrationZhao demonstrated significant performance improvements for semantic relation detection by combining word, chunklogical syntactic relationsover performance of individual levels(Zhao and Grishman ACL 2005 )

  • Attacking Semantic ParaphraseSome semantic paraphrase can be addressed through manually prepared synonym sets, such as are available in WordNetStevenson and Greenwood [Sheffield] (ACL 2005) measured the degree to which IE patterns could be successfully generalized using WordNetmeasured on executive succession taskstarted with a small seed set of patterns

  • Seed Pattern Set for Executive Succession

    v-appoint = { appoint, elect, promote, name }v-resign = { resign, depart, quit}

    Subject

    Verb

    Object

    company

    v-appoint

    person

    person

    v-resign

    -

  • Evaluating IE Patterns

    Text filtering metric: if we select documents / sentences containing a pattern, how many of the relevant documents / sentences do we get?

  • Wordnet worked quite well for the executive succession task

    seed expanded P R P Rdocument filtering100%26%68%96%sentence filtering81%10%47%64%

  • Challenge of Semantic ParaphraseBut semantic paraphrase, by its nature, is more open ended and more domain-specific than syntactic paraphrase, so it is hard to prepare any comprehensive resource by handCorpus-based discovery methods will be essential to improve our coverage

  • Paraphrase discoveryBasic Intuition:find pairs of passages which probably convey the same informationalign structures at points of known correspondence (e.g., names which appear in both passages)Fred xxxxx HarrietFred yyyyy Harriet

    similar to MT training from bitextsparaphrases

  • Evidence of paraphraseFrom almost parallel text: strong external evidence of paraphrase + a single aligned exampleFrom comparable text: weak external evidence of paraphrase + a few aligned examplesFrom general text: using lots of aligned examples

  • Paraphrase from Translations(Barzilay and McKeown ACL 01 [Columbia])Take multiple translations of same novel.High likelihood of passage paraphraseAlign sentences.Chunk and align sentence constituents

    Found lots of lexical paraphrases (words & phrases); a few larger (syntactic) paraphrasesData availability limited

  • Paraphrase from news sources(Shinyama, Sekine, et al. IWP 03 )Take news stories from multiple sources from same dayUse word-based metric to identify stories about same topicTag sentences for names; look for sentences in the two stories with several names in commonmoderate likelihood of sentence paraphraseLook for syntactic structures in these sentences which share namessharing 2 names, paraphrase precision 62% (articles about murder in Japanese)sharing one name, at least four examples of a given paraphrase relation, precision 58% (2005 results, English, no topic constraint)

  • Relation paraphrase from multiple examplesBasic idea:Ifexpression R appears with several pairs of namesa R b, c R d, e R f, expression S appears with several of the same pairsa S b, e S f, Then there is a good chance that R and S are paraphrases

  • Relation paraphrase -- exampleEastern Groups agreement to buy HansonEastern Groupto acquire Hanson

    CBS will acquire WestinghouseCBSs purchase of WestinghouseCBS agreed to buy Westinghouse

    (example based on Sekine 2005)

  • Relation paraphrase -- exampleEastern Groups agreement to buy HansonEastern Groupto acquire Hanson

    CBS will acquire WestinghouseCBSs purchase of WestinghouseCBS agreed to buy Westinghouse

    select main linking predicate

  • Relation paraphrase -- exampleEastern Groups agreement to buy HansonEastern Groupto acquire Hanson

    CBS will acquire WestinghouseCBSs purchase of WestinghouseCBS agreed to buy Westinghouse

    2 shared pairsparaphrase link (buy acquire)

  • Relation paraphrase, contdBrin (1998); Agichtein and Gravano (2000):acquired individual relations (authorship, location)Lin and Pantel (2001)patterns for use in QASekine (IWP 2005 )acquire all relations between two types of namesparaphrase precision 86% for person-company pairs, 73% for company-company pairs

  • Three Degrees of IE-Building Tasks1. We know what linguistic patterns we are looking for.2. We know what relations we are looking for, but not the variety of ways in which they are expressed.3. We know the topic, but not the relations involved.

  • Topic

    Set of documents on topic

    Set of patterns characterizing topic

  • Riloff MetricDivide corpus into relevant (on-topic) and irrelevant (off-topic) documentsClassify (some) words into major semantic categories (people, organizations, )Identify predication structures in document (such as verb-object pairs)Count frequency of each structure in relevant (R) and irrelevant (I) documentsScore structures by (R/I) log RSelect top-ranked patterns

  • BootstrappingGoal: find examples / patterns relevant to a given topic without any corpus tagging (Yangarber 00 )Method:identify a few seed patterns for topicretrieve documents containing patternsfind additional structures with high Riloff metricadd to seed and repeat

  • #1: pick seed patternSeed: < person retires >

  • #2: retrieve relevant documentsSeed: < person retires >Fred retired.... Harry was named president.Maki retired.... Yuki was named president.Relevant documentsOther documents

  • #3: pick new patternSeed: < person retires >

    < person was named president > appears in several relevant documents (top-ranked by Riloff metric)Fred retired.... Harry was named president.Maki retired.... Yuki was named president.

  • #4: add new pattern to pattern setPattern set: < person retires > < person was named president >

  • Applied to Executive Succession task

    v-appoint = { appoint, elect, promote, name }v-resign = { resign, depart, quit, step-down } Run discovery procedure for 80 iterationsseed

    Subject

    Verb

    Object

    company

    v-appoint

    person

    person

    v-resign

    -

  • Discovered patterns

    Subject

    Verb

    Object

    company

    v-appoint

    person

    person

    v-resign

    -

    person

    succeed

    person

    person

    be

    | become

    president

    | officer

    | chairman

    | executive

    company

    name

    president |

    person

    join | run | leave

    company

    person

    serve

    board | company

    person

    leave

    post

  • Evaluation: Text FilteringEvaluated using document-level text filtering

    Comparable to WordNet-based expansionSuccessful for a variety of extraction tasks

    Pattern set

    Recall

    Precision

    Seed

    11%

    93%

    Seed+discovered

    88%

    81%

  • Document Recall / Precision

  • Evaluation: Slot fillingHow effective are patterns within a complete IE system?MUC-style IE on MUC-6 corpora

    Caveat: filtered / aligned by hand

    manualMUC54716247 70 56manualnow697974 56 75 642774 40527260

  • Topical Patterns vs. ParaphrasesThese methods gather the main expressions about a particular topicThese include sets of paraphrasesname, appoint, selectBut also include topically related phrases which are not paraphrasesappoint & resignshoot & die

  • Pattern Discovery + Paraphrase DiscoveryWe can couple topical pattern discovery and paraphrase discoveryfirst discover patterns from topic description (Sudo )then group them into paraphrase sets (Shinyama )Result are semantically coherent extraction pattern groups (Shinyama 2002)although not all patterns are groupedparaphrase detection works better because patterns are already semantically related

  • Paraphrase identification for discovered patterns (Shinyama et al 2002)worked well for executive succession task (in Japanese): precision 94%, coverage 47%coverage = number of paraphrase pairs discovered / number of pairs required to link all paraphrasesdidnt work as well for arrest task fewer names, multiple sentences with same name led to alignment errors

  • ConclusionCurrent basic research on NLP methods offers significant opportunities for improved IE performance and portabilityglobal optimization to improve analysis performancericher treebanks to support greater coverage of syntactic paraphrasecorpus-based discovery methods to support greater coverage of semantic paraphrase