[1996] [riloff] automatically generating extraction patterns from untagged text - aaai96-155

Upload: hgtuan2003

Post on 10-Apr-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/8/2019 [1996] [Riloff] Automatically Generating Extraction Patterns From Untagged Text - AAAI96-155

    1/6

    A u t o m a t i c a l ly G e n e r a t i n g E x t r a c t io n P a t t e r n s f r o m U n t a g g e d T e x tEllen Riloff

    Departmen t of Com puter ScienceUniversity of UtahSalt Lake City, UT [email protected]

    A b s t r a c tMan y corpus-based natural language processing sys-tems rely on text corpora that have been manuallyanno tated with syntactic or seman tic tags. In partic-ular, all previous dictionary con struction systems forinformation extraction have used an annotated train-ing corpus or some form of annotated input. We havedeveloped a system called AutoSlog-TS that createsdictionaries of extraction patterns using only untaggedtext. AutoSlog-TS is based on the AutoSlog system,which generated extraction patterns using annotatedtext and a set of heuristic rules. By adapting Au-toSlog and combining it with statist ical techniques, w eeliminated its dependency on tagged text . In experi-ments with the MU G-4 terrorism domain, AutoS log-TS created a dictionary of extraction patterns thatperformed comparably to a dictionary created by Au-toSlog, using only preclassified texts as input.

    Motiva t ionThe vast amount of text becoming available on-lineoffers new possibilities for conquering the knowledge-engineering bottleneck lurking underneath most natu-ral language processing (NLP ) systems. Mos t corpus-based sy stems rely on a text corpus that has been man-ually tagged in some way. For examp le, the Brown cor-pus (Francis & Kucera 1982 ) and the Penn Treebankcorpus (Marcus, Santorini, & Marcinkiewicz 1993 ) arewidely used because they have been manually anno-tated with part-of-speech and syntactic bracketing in-formation. Part-of-speech tagging and syntactic brack-eting are relatively general in nature, so these corporacan be used by different natural language processingsystems and for different domains. But some corpus-based systems rely on a text corp us that has beenmanually tagged in a domain-specific or task-specificmanner. For examp le, corpus-based approa ches to in-formation extraction generally rely on special domain-specific text annotations. Consequently, the manualtagging effort is considerably less cost effective b ecausethe annotated corpus is useful for only one type of NLPsystem and for only one domain.

    1044 Natural Language

    Corpus-based approa ches to information extractionhave demon strated a significant time savings over con-ventional hand-coding method s (Riloff 1993 ). But thet ime required to annotate a training corpus is a non-trivial expense. To further reduce this knowledge-engineering bottleneck, we have developed a systemcalled AutoSlog-TS that generates extraction patternsusing untagged text. AutoSlog-TS needs only a pre-classified corpus of relevant and irrelevant texts. Noth-ing inside the texts needs to be tagged in any way.

    G e n e r a t i n g E x t r a c t i on P a t t e r n s f r o mTagged Text

    R e l a t e d w o r kIn the last few years, several systems have been de-veloped to generate patterns for information extrac-tion automatically. All of the previous systems de-pend on manually tagged training data of some sort.One of the first dictionary construction systems wasAutoSlog (Riloff 1993 )) which requires tagged nounphrases in the form of annotated text or text with asso-ciated answer keys. PALKA (Kim & Moldovan 1993)is similar in spirit to AutoSlog, but requires manuallydefined frames (including keywo rds), a semantic hierar-chy, and an associated lexicon. Competing hypoth esesare resolved by referring to manually encoded answerkeys, if available, or by asking a user.

    CRYSTAL (Soderland et al . 1995) also generatesextraction patterns using an annotated training cor-pus. CR YSTA L relies on both domain-specific anno-tations plus a semantic hierarchy and associated lex-icon. LIEP (Huffman 1996 ) is another system thatlearns extraction patterns but relies on predefined key-words , object reco gnizers (e.g., to identify people andcompanies), and human interaction to annotate eachrelevant sentence with an event type. Cardie (Cardie1993 ) and Hastings (Hastings & Lytinen 1994 ) alsodeveloped lexical acquisition systems for informationextraction, but their systems learned individual word

    From: AAAI-96 Proceedings. Copyright 1996, AAAI (www.aaai.org). All rights reserved.

  • 8/8/2019 [1996] [Riloff] Automatically Generating Extraction Patterns From Untagged Text - AAAI96-155

    2/6

    meanings rath er than extraction patterns. Both sys-tems used a semantic hierarchy and sentence contextsto learn the meanings of unknown word s.AutoSlogAutoSlog (Riloff 1996) is a dictionary construction sys-tem that creates extraction patterns automatically us-ing heuristic rules. As input, AutoSlog needs answerkeys or text in which the noun phrases that should beextracted have been labeled with domain-specific tags.For exam ple, in a terrorism domain, noun phrasesthat refer to perpetra tors, targets, and victims maybe tagged . Given a tagge d noun phrase and the origi-nal source text, AutoSlog first identifies the sentence inwhich the noun phrase appears. If there is more thanone such sentence and the annotation does not indicatewhich one is appropriate, then AutoSlog chooses thefirst one. AutoSlog invokes a sentence analyzer calledCIRCUS (L e h nert 1991 ) to identify clause boundariesand syntactic constituents. AutoSlog needs only a flatsyntactic analysis tha t recognizes the subject, verb, di-rect object, and prepositional phras es o f each clause,so almost any parser could be used. AutoSlog deter-mines which clause contains the targeted nounand applies the heuristic rules shown in Figure

    phrase

    P A T T E R N E X A M P L E< s u b j > p a s s i v e - v e r b was murdered< s u b j > a c t i v e - v e r b bombed< s u b j > v e r b i n f i n . attempted to kill< su b j > a u x n o u n was victim -p a s s i v e - v e r b < d o b j > a c t i v e - v e r b < d o b j >i n f i n . < d o b j >v e r b i n f i n . < d o b j >g e r u n d n o u n a u x < d o b j>

    killed bombed to kiII triedto attack killing fatality was

    n o u n p r e p < n p > bomb against a c t i ve -v e r b p r e p killed with p a s s i v e -v e r b p r e p was aimed at

    Figure 1: AutoSlog HeuristicsThe rules are divided into three categories, based

    on the syntactic class of the noun phrase . For exam-ple, if the targeted noun phrase is the subject of aclause, then the subject rules apply. Each rule gen-erates an expression that likely defines the conceptualrole of the noun phrase. In most cases, they assumethat the verb determines the role. The rules recognizeseveral verb forms, such as active, passive, and infini-

    In principle, passive verbs should not have direct ob-jects. We included this pattern only because C IRC US oc-casionally confused active and passive constructions.

    tive. An extraction pattern is created by instantiatingthe rule with the specific word s that it match ed in thesentence. The rules are ordere d so the first one thatis satisfied ge nerates an extraction pattern, with thelonger patterns being tested before the shorter ones.As an exam ple, consider the following sentence:

    Ricardo Castellar, theyesterday by the FMLN. mayor, was kidnapped

    Suppose that Ricardo Castellar was tagged as a rele-vant victim. AutoSlog passes the sentence to CIRC US,which identifies Ricardo Castellar as the subject. Au-toslogs subject heuristics are tested and the < s u b j >p a s s i v e - v e r b rule fires. This pattern is instantiatedwith the specific w ords in the sentence to produ ce theextraction pattern < vi c t i m > w a s k i d n a p p e d . In fu-ture texts, this pattern will be activated when ever theverb kidnapped appe ars in a passive construction,and its subject will be extracted as a victim.

    AutoSlog can produ ce undesirable patterns for a va-riety of reasons, including faulty sentence analysis, in-correct pp-attachm ent, or insufficient con text. The re-fore a person must manually inspect each extractionpattern and decide which ones should be accepted andwhich ones should be rejected. This manual filteringproces s is typically very fast. In experimen ts withthe MUC-4 terrorism domain, it took a user only 5hours to review 1237 extraction patterns (Riloff 1993).Although this manual filtering proce ss is part of theknowledge-engineering cycle, generating the annotatedtraining corpus is a much more substantial bottleneck.

    G e n e r a t i n g E x t r a c t i on P a t t e r n s f r o mUn ta g g e d Te x t

    To t a g o r n o t to t a g ?Generating an annotated training corpus is a signifi-cant undertaking, both in time and difficulty. Previ-ous experiments with AutoSlog suggested that it took auser about 8 hours to annotate 160 texts (Riloff 1996).Therefore it would take roughly a week to construct atraining corpus of 1000 texts. Comm itting a domainexpert to a knowledge-engineering project for a weekis prohibitive for most short-term applications.

    Furtherm ore, the annotation task is deceptivelycomp lex. For AutoSlog, the user must annotate rel-evant noun phrase s. But wha t constitutes a relevantnoun phrase ? Should the user include modifiers or justthe head noun? All modifiers or just the relevant modi-fiers? Determiners? If the noun phras e is part of a con-junction, should th e user annotate all conjuncts or justone? Should the user include appositives? How aboutprepositional phrases ? The meaning of simple NPs canchange substantially when a prepositional phras e is at-

    L e a r n i n g 1045

  • 8/8/2019 [1996] [Riloff] Automatically Generating Extraction Patterns From Untagged Text - AAAI96-155

    3/6

    t ached . For exam ple, the Bank of Boston is differ-ent from the Bank of Toronto. Real texts are loadedwith comp lex noun ph rases that often include a vari-ety of these constructs in a single reference. There isalso the question of which refe rences to tag. Shouldthe user tag all references to a person? If not, w hichones? It is difficult to specify a onvention that reliablycaptures the desired information, but not specifying aconvention can produc e inconsistencies in the data.

    To avoid these problems, we have developed a newversion of AutoSlog, called AutoSlog-TS, that does notrequire any text annotations. AutoSlog-TS requiresonly a preclassified training corpus of relevant and ir-relevant texts for the domain.2 A preclassified corpusis much e asier to generate, since the user simply needsto identify relevant and irrelevant sample texts. Fur-thermo re, relevant texts are already available on-linefor many applications and could be easily exploited tocreate a training corpus for AutoSlog-TS.A u t o S l o g - T SAutoSlog-TS is an extension of AutoSlog that operatesexhaustively by generating an extraction pattern forevery noun phrase in the training corpus. It then eval-uates the extraction patterns by processing the corpusa second time and generating relevance statistics foreach pattern. The proces s is illustrated in Figure 2.

    preclassified texts& Q

    Stage I

    s: l!!arldumV: wasbombed rr)Pp: bye

    preclassified texts

    bombed by a c t i v e -v e r b d o b jand i n f in i t i v e p r e p < n p > . The two additional ruleswere created for a business domain from a previousexperiment and are probably not very important forthe experiments described in this paper.3 A more sig-nif icant difference is that AutoSlog-TS allows multiplerules to fire if more than one match es the context. Asa result, multiple extraction patterns may be gener-ated in response to a single noun phrase . For exam-ple, the sentence terrorists bombed the U.S. embassymight pro duce two patterns to extract the terrorists:< s u b j > b o m b e d and < su b j > b o m b e d e m b a s s y .The statistics will later reveal wheth er the shorter,more general pattern is good enough or whether thelonger pattern is needed to be reliable for the domain.At the end of Stage 1, we have a giant dictionary of ex-traction patterns that are literally capable of extract-ing every noun phrase in the corpus.

    In Stage 2, we proces s the training corpus a secondtime using the new extraction patterns. The sentenceanalyzer activates all patterns that are applicable ineach sentence. We then comp ute relevance statisticsfor each pattern. Mo re specifically, we estimate theconditional probability that a text is relevant giventhat it activates a particular extraction pattern. Theformula is:

    Pr(relevant text 1 text contains pattern,) = ,~,efi;frr:~;l twhere rel - freq t is the number of instances of pattern,that were activated in relevant texts, and total-freq, isthe total number of instances of pattern, that were acti-vated in the training corpus. For the sake of simplicity,we will refer to this probability as a patterns relevancerate. Note that m any patterns will be activated in rel-evant te xts even though they are not domain-specific.For example, general phrases such as was reportedwill appea r in all sorts of texts. The motivation behindthe conditional probability estimate is that domain-specific expressions will appe ar substantially more of-ten in relevant texts than irrelevant texts.

    Next, we rank the patterns in order of importanceto the domain. AutoSlog-TSs exhaustive approach topattern generation can easily produc e tens of thou-sands of extraction patterns and we cannot reasonablyexpect a human to review them all. Therefore, we usea ranking function to order them so that a person onlyneeds to review th e most highly ranked patterns.We rank the extraction patterns according to theformula: relevan ce rate * log2(frequency), unless therelevance rate is 5 0.5 in which case the function re-turns zero because the pattern is negatively correlated

    3See (Riloff 1996) for a more detailed explanation.

    1046 Natural L a n g u a g e

  • 8/8/2019 [1996] [Riloff] Automatically Generating Extraction Patterns From Untagged Text - AAAI96-155

    4/6

    with the domain (assuming the corpus is 50% relevant).This formula promotes patterns that have a high rel-evance rate or a high frequency. It is important forhigh frequency patterns to be considered even if theirrelevance rate is only mode rate (say 70% ) because ofexpressions like was killed which occur frequently inboth relevant and irrelevant texts. If only the pat-terns with the highest relevance rates were promo ted,then crucial e xpressions like this would b e buried inthe ranked list. We do not claim that this particularranking function is the best - to the contrary, we willargue later that a better function is needed. But thisfunction work ed reasonably well in our experiments.

    E x p e r i m e n t a l R e s u l t sAutoma ted scoring program s were developed to eval-uate information extraction (IE) systems for the mes-sage understanding conferences, but the credit assign-ment problem for any individual component is vir-tually im possible using only the scores produced bythese programs. Therefore, we evaluated AutoSlog andAutoSlog-TS by manually inspecting the performanceof their dictionaries in the MU C-4 terrorism domain.We used the MUC -4 texts as input and the MUC-4answer keys as the basis for judging correct out-put (MUC-4 Proceedings 1992).

    The AutoSlog dictionary was constructed using the772 relevant MU C-4 texts and their associated answerkeys. A utoSlog produced 1237 extraction patterns,which we re manually filtered in about 5 person-hours.The final AutoSlog dictionary contained 450 extrac-tion patterns. The AutoSlog-TS dictionary was con-structed using the 1500 MU C-4 development texts, ofwhich about 50% are relevant. AutoSlog-TS generated32,345 unique extraction patterns. To make the size ofthe dictionary more manageable, we discarded patternsthat w ere propo sed only once under the assumptionthat they were not likely to be of much value. This re-duced the size of the dictionary down to 11,225 extrac-tion patterns. We loaded the dictionary into CIRC US,reprocesse d the corpus, and compu ted the relevancerate of each pattern. Finally, we ranked all 11,22 5 pat-terns using the ranking function. The 25 top-rankedextraction patterns appear in Figure 3. Mos t of thesepatterns are clearly associated with terrorism, so theranking function appea rs to be doing a good job ofpulling the domain-specific patterns up to the top.

    The ranked extraction patterns were then presentedto a user for manual review.4 The review process con-sists of deciding whe ther a pattern should be acceptedor rejected, and labeling the accepted patterns.5 For

    4The author did the manual review for this experiment.5Note that AutoSlogs patterns were labeled automati-

    1.2.3.4.5.6.7.8.9.

    exploded 14murder of 15assassination of 16 was k i l led 17 was kidnapped 18attack on 19 was in jur ed 20exploded in death of 21 .22 .

    occurred was locatedtook-place on responsibil ity for occurred on was wounded in destroyed was murderedone of 10. took-place 23. kidnapped11. caused 24. exploded on 1.2. claimed 25. died13. was wounded

    Figure 3: The Top 25 Extraction Patterns

    example, the second pattern m u r d e r o f < u p > wasaccepted and labeled as a murder pattern that willextract victims. The user reviewed the top 1970 pat-terns in about 85 minutes and then stopped becausefew patterns were being accepted at that point. Intotal, 210 extraction patterns were retained for the fi-nal dictionary. The review time was much faster thanfor AutoSlog, largely because the ranking scheme clus-tered the best patterns near the top so the retentionrate droppe d quickly.

    Note that some of the patterns in Figure 3 were notaccepted for the dictionary even though they are asso-ciated with terrorism. Only patterns useful for extract-ing perpetrators, victims, targets, and weapo ns werekept. For examp le, the pattern e x p l o d e d i n < n p >was rejected because it would extract locations.

    To evaluate the two dictionaries, we chose 10 0 blindtexts from the MUC-4 test set. We used 25 relevanttexts and 25 irrelevant texts from the TST3 test set,plus 25 relevant texts and 25 irrelevant texts from theTST4 test set. We ran CIRCUS on these 10 0 texts,first using the AutoSlog dictionary and then using theAutoSlog-TS dictionary. The underlying informationextraction system was otherwise identical.

    We scored the output by assigning each extracteditem to one of four ca tegories: correct, mislabeled, du-plicate, or spurious. An item was scored as correct ifit matched against the answer keys. An item was mis-labeled if it matched against the answer keys but wasextracted as the wrong type of object. For example, ifHector Colindres was listed as a murder victim butwas extracted as a physical target. An item was a du-plicate if it was coreferent with an item in the answerkeys. For example, if him w as extracted and coref-erent with Hector Colindres. The extraction patternacted correctly in this case, but the extracted informa-tion was not specific enough. Correct items extractedmore than once were also scored as duplicates. An itemtally b y referring to the text annotation s.

    Learning 1047

  • 8/8/2019 [1996] [Riloff] Automatically Generating Extraction Patterns From Untagged Text - AAAI96-155

    5/6

    was spurious if it did not refer to any object in the an-swer keys. All items extracted from irrelevant textswere spurious. Finally, items in the answer keys thatwere not extracted were counted as missing. There-fore correct + missing should equal the total numberof items in the answ er keys.6

    Tables 1 and 2 show the numbers obtained aftermanually judging the output of the dictionaries. Wescored three items: perpetrators , victims, and tar-gets. The performance of the two dictionaries was verysimilar. The AutoSlog dictionary extracted slightlymore correct items, but the AutoSlog-TS dictionaryextracted fewer spurious items.7

    Slot Corr. Miss. Mislab . D up. Spur.perp 36 22 1 11 12 9Victim 41 24 7 18 1 1 3iTarget 39 1 19 8 18 10 8Total 116 1 65 16 47 3 5 0Table 1: AutoSlog Results

    Table 2: AutoSlog-TS ResultsWe applied a well-known statistical technique, the

    two-sample t test, to determine wheth er the differ-ences between the dictionaries were statistically sig-nificant. We tested four data sets: correct, correct+ duplicate, mis sing, and spurious. The t values forthese sets were 1.1012, 1.1818, 0.1557, and 2.27 re-spectively. The correct, correct + dupli cate, and miss-ing data sets were not significantly different even atthe p < 0.20 significance level. These results sug gestthat AutoSlog and AutoSlog-TS can extract relevantinformation w ith comp arable performan ce. The spuri-ous data, ho wever, was significantly different at the p< 0.05 significance level. Therefo re AutoSlog-TS wassignificantly more effective at reducing spurious extrac-tions.

    We applied three performanc e metrics to this rawdata: r e c a l l , p r e c i s i o n , and the F - m e a s u r e . Wecalculated recall as correct / (correct + mi ssing), andcomputed precision a~ (correct + dupl icat e) / (correct+ duplicate + mislabeled + spurious). The F-measure

    6 Optional items in the answer keys were scored ascorrect if extracted, but w ere never sco red as missing.7The difference in mislabeled items is an artifact of thehuman review process, not AutoSlog-TS.

    (MUC -4 Proceedings 1992 ) combines recall and preci-sion into a single value, in our case with equal weight.

    As the raw data suggests, Table 3 shows that Au-toSlog achieved slightly higher recall and AutoSlog-TSachieved higher precision. The F-measure scores weresimilar for both systems, but AutoSlog-TS obtainedslightly higher F scores for victims and targets. Notethat the AutoSlog-TS dictionary contained only 210patterns, while the AutoSlog dictionary contained 450patterns, so AutoSlog-TS achieved a compara ble levelof recall with a dictionary less than half the size.I II AutoSlog II AutoSlog-TS 1

    I I I I I I I J

    Table 3: Comp arative ResultsThe AutoSlog precision results are substantially

    lower than those generated by the MUC-4 scoringprogram (Riloff 1993 ). Th ere are several reasons forthe difference. For one, the current experiments weredone with a debilitated version of CIRC US that didnot proces s conjunctions or semantic features. Al-though AutoSlog does not use semantic features tocreate extraction patterns, they can be incorporatedas selectional restrictions in the patterns. For exam-ple, extracted victims should satisfy a human con-straint. Semantic features were not used in the cur-rent experiments for technical reasons, but undoubt-edly would have improved the precision of both dic-tionaries. Also, the previously reported scores werebased on the UMass/MUC-4 system, which includeda discourse analyzer that used domain-specific rules todistinguish terrorist incidents from other events. CIR-CUS was designed to extract potentially relevant infor-mation using only local context, under the assumptionthat a comp lete IE system would contain a discourseanalyzer to make global decisions about relevance.B e h i n d t h e s c e n e sIt is informative to look behind the scenes and try tounderstand why AutoSlog achieved slightly better re-call and why AutoSlog-TS achieved better precision.Mos t of AutoSlogs additional recall came from lowfrequency patterns that were buried d eep in AutoSlog-TSs ranked list. The main advantage of corpus-tagging is that the annotations provide guidance so thesystem can more easily h one in on the relevant expres-sions. Without corpus tagging, we are at the mercyof the ranking function. We believe that the rankingfunction did a good job of pulling the most imp or-

    1048 Natural Language

  • 8/8/2019 [1996] [Riloff] Automatically Generating Extraction Patterns From Untagged Text - AAAI96-155

    6/6

    tant patterns up to the top, but additional research isneeded to recognize good low frequency patterns.

    In fact, we have reason to believe that AutoSlog-TSis ultimately capable of producing better recall thanAutoSlog because it generated many good patternsthat AutoSlog did not. AutoSlog-TS produced 158patterns with a relevance rate 2 90% and frequency2 5. Only 45 of these patterns were in the originalAutoSlog dictionary.

    The higher precision de monstrated by AutoSlog-TSis probably a result of the relevance statistics. For ex-ample, the AutoSlog dictionary contains an extractionpattern for the expression < s u b j > a d m i t t e d , but thispattern was found to be negatively correlated with rele-vance (46%) by AutoSlog-TS. Some of AutoSlogs pat-terns looked good to the human reviewer, but were notin fact highly co rrelated with relevance.

    In an ideal ranking schem e, the heavy hitter ex-traction patterns should float to the top so that themost important patterns (in terms of recall) are re-viewed first. AutoSlog-TS was very successful in thisregard. Almost 35% recall was achieved after review-ing only the first 50 extraction patterns! Almost 50%recall was achieved after reviewing about 300 patterns.

    F u t u r e D i r e c t i o n sThe previous results su ggest that a core dictionary ofextraction patterns can be created after reviewing onlya few hundred patterns. The specific num ber of pat-terns that need to be reviewed will ultimately dependon the breadth of the domain and the desired perfor-mance levels. A potential problem with AutoSlog-TSis that there are undoubtedly many useful patternsburied deep in the ranked list, which cumulativelycould have a substantial impact on performa nce. Thecurrent ranking schem e is biased tow ards encouraginghigh frequency patterns to float to the top, but a bet-ter ranking schem e migh t be able to balance these twoneeds more effectively. The precision of the extractionpatterns could also be improved by adding semanticconstraints and, in the long run, creating more com-plex extraction patterns.

    AutoSlog-TS represents an important step towar dsmaking information extraction systems more easilyportable across domains. AutoSlog-TS is the first sys-tem to generate dom ain-specific extraction patternsautomatically without annotated training da ta. A useronly needs to provide sample texts (relevant and ir-relevant), and spend s ome time filtering and labelingthe resulting extraction patterns. Fast dictionary con-struction also opens the door for IE technology to sup-port other tasks, such as text classification (Riloff &Shoen 1995 ). Finally, AutoSlog-T S represents a new

    appro ach to exploiting on-line text corpo ra for domain-specific knowled ge acquisition by squeezing preclassi-fied texts for all theyre worth .

    Ac k n o wle d g me n tsThis research was funded by NSF grant MIP-9023174and NSF grant IRI-9509820. Thanks to Kern Masonand Jay Shoen for generating much of the data.

    Re fe re n c e sCardie , C . 1993. A Case-Based Approach to KnowledgeAcquisit ion for Domain-S pecif ic Sentence Analysis . InProceedings of the Eleventh National Conference on Arti-ficial Intelligence, 798-803. AAAI Press/The MI T Press.Francis, W., and Kucera, H. 1982. Frequency Analysis ofEnglish Usage. Boston , MA: Houghton Miff l in .Hast ings, P . , and Lyt inen, S . 1994 . The Ups and Downsof Lexical Acquisit ion . In Proceedings of the TwelfthNational Conference on Artificial Intelligence, 754-759.AAAI Press/The MI T Press.Huffman, S . 1996 . Learning information extract ion pat-terns from examples. In Wermter, S.; Rilof f, E.; andScheler , G., eds., Connectionist, Statistical, and SymbolicApproaches to Learning for Natural Language Processing.Springer-Verlag , Berl in .Kim, J . , and Moldovan, D. 1993. Acquisit ion of Sem anticPatterns for Inform ation Extraction from Corpora. InProceedings of the Ninth IEEE Conference on ArtificialIntelligence for Applications, 171-176. Los Alamitos, CA:IEEE Computer Society Press.Lehne rt , W. 1991. Symb olic/Subsym bolic Sentence Anal-ysis : Exploit ing the Best of Two Worlds. In Barnden, J . ,and PolIack, J., eds., Advances in Connectionist and Neu-ral Computation Theory, Vol. 1. Ablex Publi shers, Nor-wood, NJ. 135-164.Marcus, M.; Santorin i , B . ; and Marcinkiewicz , M. 1993.Building a Large Annotated Corpus of English : The PennTreebank. Computational Linguistics 19(2):313-330.M U C - 4 Proceedings. 1992. Proceedings of the FourthMessage Understanding Conference (MUC-4) . S a n M a -teo , CA: Morgan Kaufmann.Riloff , E . , and Shoen, J . 1995. Automatical ly Acquir ingConceptual Pat terns Without an Annotated Corpus. InProceedings of the Third Workshop on Very Large Cor-pora, 148-161.Rilof f, E. 1993. Autom atically Constructing a Dictio -nary for Info rmation Extraction Task s. In Proceedingsof the Eleventh National Conference on Artificial Intelli-gence, 811-816. AAAI Press/The MI T Press.Riloff , E . 1996. An Empirical Study of Automated Dic-tionary Construction for-Inform ation Extraction in ThreeDomains. Artificial Intelligence. Vol. 85. Forthcoming.Sode rland, S.; Fisher, D.; Aseltin e, J.; and Lehnert, W.1995. CRYST AL: Inducing a conceptual dict ionary . InProceedings of the Fourteenth International Joint Confer-ence on Artificial Intelligence, 1314-1319.

    Learning 1049