open ie and universal schema discovery heng ji [email protected] acknowledgement: some slides from daniel...
TRANSCRIPT
Open IE and Universal Schema Discovery
Heng Ji
Acknowledgement: some slides from Daniel Weld and Dan Roth
Traditional, Supervised I.E.
Raw Data
Labeled Training
Data
LearningAlgorithm
Extractor
Kirkland-based Microsoft is the largest software company.Boeing moved it’s headquarters to Chicago in 2003.Hank Levy was named chair of Computer Science & Engr.
…
HeadquarterOf(<company>,<city>)
Solutions
3
Open IE Universal Schema Discovery Concept Typing
Open Information Extraction
The challenge of Web extraction is to be able to do Open Information Extraction. Unbounded number of relations Web corpus contains billions of documents.
How open IE systems work
learn a general model of how relations are expressed (in a particular language), based on unlexicalized features such as part-of-speech tags. (Identify a verb)
Learn domain-independent regular expressions. (Punctuations, Commas).
Methods for Open IESelf Supervision
Kylin (Wikipedia) Shrinkage & Retraining
Hearst Patterns PMI Validation Subclass Extraction
Pattern LearningStructural Extraction
List Extraction & WebTables TextRunner
Kylin: Self-Supervised Information Extraction from Wikipedia
[Wu & Weld CIKM 2007]
Its county seat is Clearfield.
As of 2005, the population density was 28.2/km².
Clearfield County was created in 1804 from parts of Huntingdon and Lycoming Counties but was administered as part of Centre County until 1812.
2,972 km² (1,147 mi²) of it is land and 17 km² (7 mi²) of it (0.56%) is water.
From infoboxes to a training set
Kylin Architecture
Schema Mapping
Heuristics Edit History String Similarity
• Experiments• Precision: 94% Recall: 87%
• Future• Integrated Joint Inference
Person Performerbirth_datebirth_placenameother_names…
birthdatelocationnameothername…
Main Lesson: Self Supervision
Find structured data source Use heuristics to generate training data
E.g. Infobox attributes & matching sentences
The KnowItAll System
PredicatesCountry(X)
Domain-independent Rule Templates
<class> “such as” NP
Bootstrapping
Extraction Rules“countries such as” NP
Discriminators“country X”
ExtractorWorld Wide Web
ExtractionsCountry(“France”)
Assessor
Validated ExtractionsCountry(“France”), prob=0.999
Unary predicates: instances of a class
Unary predicates: instanceOf(City), instanceOf(Film), instanceOf(Company), …
Good recall and precision from generic patterns:
<class> “such as” XX “and other” <class>
Instantiated rules:“cities such as” X X “and
other cities”“films such as” X X “and
other films”“companies such as” X X
“and other companies”
Recall – Precision Tradeoff
High precision rules apply to only a small percentage of sentences on Web
hits for “X” “cities such as X” “X and other cities”
Boston 365,000,000 15,600,000 12,000Tukwila 1,300,000 73,000 44Gjatsk 88 34 0Hadaslav 51 1 0
“Redundancy-based extraction” ignores all but the unambiguous references.
Limited Recall with Binary Rules
Relatively high recall for unary rules:
“companies such as” X 2,800,000 Web hits
X “and other companies” 500,000 Web hits
Low recall for binary rules:
X “is the CEO of Microsoft” 160 Web hits
X “is the CEO of Wal-mart” 19 Web hits
X “is the CEO of Continental Grain” 0 Web hits
X “, CEO of Microsoft” 6,700 Web hits
X “, CEO of Wal-mart” 700 Web hits
X “, CEO of Continental Grain” 2 Web hits
Examples of Extraction Errors
Rule: countries such as X => instanceOf(Country, X)
“We have 31 offices in 15 countries such as London and France.”
=> instanceOf(Country, London)instanceOf(Country, France)
Rule: X and other cities => instanceOf(City, X)“A comparative breakdown of the cost of living in Klamath County and other cities follows.”
=> instanceOf(City, Klamath County)
“Generate and Test” Paradigm
1. Find extractions from generic rules2. Validate each extraction
Assign probability that extraction is correct Use search engine hit counts to compute PMI PMI (pointwise mutual information) between
extraction “discriminator” phrases for target concept
PMI-IR: P.D.Turney, “Mining the Web for synonyms: PMI-IR versus LSA on TOEFL”. In Proceedings of ECML, 2001.
Computing PMI Scores
Measures mutual information between the extraction and target concept.
I = an instance of a target concept instanceOf(Country, “France”)
D = a discriminator phrase for the concept “ambassador to X”
D+I = insert instance into discriminator phrase“ambassador to France”
|)(|
|)(|),(
Ihits
IDhitsIDPMI
Example of PMI
Discriminator: “countries such as X” Instance: “France” vs. “London” PMI for France >> PMI for London (2 orders of mag.) Need features for probability update that distinguish
“high” PMI from “low” PMI for a discriminator
394.1000,300,14
800,27 EPMI
“countries such as France” : 27,800 hits“France”: 14,300,000 hits
“countries such as London” : 71 hits“London”: 12,600,000 hits
66.5000,600,12
71 EPMI
20
Chicago Unmasked
City sense Movie sense
|)(|
|)(|
CityChicagoHits
CityMovieChicagoHits
21
Impact of Unmasking on PMI
Name Recessive Original Unmask BoostWashington city 0.50 0.99 96%Casablanca city 0.41 0.93 127%Chevy Chase actor 0.09 0.58 512%Chicago movie 0.02 0.21 972%
22
RL: learn class-specific patterns. “Headquarted in <city>”
SE: Recursively extract subclasses.“Scientists such as physicists and chemists”
LE: extract lists of items (~ Google Sets).
How to Increase Recall?
23
List Extraction (LE)
1. Query Engine with known items.2. Learn a wrapper for each result page.3. Collect large number of lists.4. Sort items by number of list “votes”.
LE+A=sort list according to Assessor.
Evaluation: Web recall, at precision= 0.9.
TextRunner
TextRunner Works in two phases.
1. Using a conditional random field, the extractor learns to assign labels to each of the words in a sentence.
2. Extracts one or more textual triples that aim to capture (some of) the relationships in each sentence.
Information Redundancy Assumption (Banko et al., 2008)
26
Performance Comparison on Traiditonal IE vs. Open IE
27
General Remarks
29
Exploit data-driven methods (e.g., domain-independent patterns) to extract relation or event tuples
It dramatically enhanced the scalability of IE. Obtain substantially lower recall than traditional IE
heavily rely on information redundancy to validate the extraction results and thus suffer from the ``long-tail" knowledge sparsity problem
Incapable of generalizing the lexical contexts in order to name new fact types Over-simplified IE problem: IE != Sentence-level Relation
Extraction; how about entity and event types?
They focused on making the types of relations and events unrestricted, while still used other IE components such as name tagging and semantic role labeling for pre-defined types
Universal Schema
30
Discover a wide range of domain-independent fact types manually AMR, extended NE
or automatically by relation clustering based on coreferential arguments Go_off represented by “plant”, “set_off”, “injure” because
they share coreferential arguments
combining patterns with external knowledge sources such as Freebase or query logs
Some name tagging work focused on extracting more fine-grained types beyond the traditional entity types (person, organization and geo-political entities)
Open Questions
31
Is there a close set of universal schema? How many knowledge sources should we
use for discovering universal schema? What are they?
Many knowledge bases are available, how to unify them (automatically)?
Concept Typing
32
Chunking Bootstrapping Clustering with Contexts
Concept Mention Extraction (Wang et al., 2013)
33
Concept Mention Extraction (Tsai et al., 2013)
Identify and categorize mentions of concepts (Gupta and Manning, 2011) TECHNIQUE and APPLICATION
“We apply support vector machines on text classification.” Unsupervised Bootstrapping algorithm (Yarowsky, 1995;
Collins and Singer, 1999)
The proposed algorithm1. Extract noun phrases (Punyakanok and Roth, 2001)
2. For each category, initialize a decision list by seeds.3. For several rounds,
1. Annotate NPs using the decision lists.2. Extract top features from new annotated phrases, and add
them into decision lists.
34
Seeds
35
Paper1……………………………………support vector machine………………... …………………………………………………………………………………….c4.5……..
Paper2……………………………………svm-based classification………………….………………………………….............decision_trees………….…….…………………………………
Paper4……………………………………maximal_margin_classifiers…………………………………….…………………………………………………………………..
Paper3.…………………………………………………………………………..svm….…………………………………….……………………………………………………
(Cortes,1995)
(Quinlan,1993)
(Vapnik,1995)(Vapnik,1995)
(Quinlan,1993)
(Cortes,1995)(Cortes,1995)
(Quinlan,1993)
(Vapnik,1995)(Vapnik,1995)
(Quinlan,1993)
(Cortes,1995)
•c4.5•decision trees
•support vector machine•svm-based classification
•svm•maximal margin classifiers
Citation-Context Based Concept Clustering(CitClus) Cluster mentions into semantic coherent concepts
1. Group concept mentions by citation context2. Merge clusters based on lexical similarity between
mentions in the clusters
Remarks
37
Where to draw the line for different granuarilities?
Maybe the type of each phrase should be a path in the ontology structure?
Is an absolute cold-start extraction possible?
How to use other resources such as scenario models and social cognitive theories?
Paper presentations
38
Lifu’s presentation + Phrase Typing Exercise
39