extracting rich knowledge from text john d. prange president 410-964-0179...
TRANSCRIPT
Extracting Rich Knowledge from Text
John D. PrangePresident
410-964-0179
www.languagecomputer.com
Our Company
Language Computer Corporation (LCC)– Human Language Understanding Research and Development– Founded 11 years ago in Dallas, Texas; Established a second
office in Columbia, MD in mid-2006– ~70 research scientists and engineers– Research funding primarily from DTO, NSF, AFRL, DARPA
and several individual Government Agencies– Technology has been transferred to individual Government
Organizations, Defense contractors and more recently to Commercial Customers
Outline of Talk
Three Lines of Research & Development within LCC that impact Semantic-Level Understanding
– Information ExtractionInformation Extraction CiceroLite and other Cicero Products
– Extracting Rich Knowledge from TextExtracting Rich Knowledge from Text Polaris: Semantic Parser XWN KB: Extended WordNet Knowledge Base Jaquar: Knowledge Extraction from Text Context and Events: Detection, Recognition & Extraction
– Cogex: Reasoning and Inferencing over Cogex: Reasoning and Inferencing over Extracted KnowledgeExtracted Knowledge Semantic Parsing & Logical Forms Lexical Chains & On-Demand Axioms Logic Prover
Information ExtractionInformation Extraction– Given an entire corpus of documents
– Extracting every instance of some particular kind of information Named Entity Recognition – extraction of entities such as person, location and
organization names Event-based Extraction – extraction of real world events such as bombings,
deaths, court cases, etc.
LCC’s Areas of Research
Two High-Performance NER Systems
Accurate and customizable NE Recognition for English
Classifies 8 high-frequency NE classes with over 90% precision and recall
Currently extended to detect over 150 different NE classes
Non-deterministic Finite-State Automata (FSA) framework resolves ambiguities in text, performs precise classification
Machine Learning-based NER for multiple languages
Statistical machine learning- based framework makes for rapid extension to new languages
Currently deployed for Arabic, German, English, and Spanish
Arabic: Classifies 18 NE classes with an average of nearly 90% F
CiceroLite CiceroLite-ML
CiceroLite Designed specifically for English, CiceroLite categorizes 8 high-
frequency NE classes with over 90% precision and recall.
But it’s capable of much much more: as currently deployed, CiceroLite can categorize up to 150 different NE classes, including:
Over 100 more!
CiceroLite-ML (Arabic)
CiceroLite-ML currently detects a total 18 different classes of named entities for Arabic with between 80% - 90% F.
Other Cicero Products
CiceroLite-ML (Mandarin Chinese)Similar scope and depth of Arabic Version shown on previous slide
CiceroCustom User customizable event extraction system using a variant of supervised learning called “active learning”
TASER (Temporal & Spatial Normalization System)Recognize 8 different types of time expressions and over 50 types of spatial expressions; Normalizies time using ISO8601; Exact Lat/Long for ~8M place names
Under Contractual Development (With Deliveries in 2007)
– CiceroRelationRelation Detection based upon ACE 2007 specifications
– CiceroCorefEntity coreference utilizing CiceroLite NER; to include cross document entity tracking
– CiceroDiscourseExtract discourse structure & topic semantics
Extracting Rich Knowledge From TextExtracting Rich Knowledge From Text– Explicit knowledge
– Implicit knowledge: implicatures, humor, sarcasm, deceptions, etc.
– Other textual phenomena: negation, modality, quantification, coreference resolution
Lexical Level & Syntax
Semantic Relations
Contexts
Events & Event Properties
Meta-Events
Event Relations
LCC’s Areas of Research
Skip Back
Extracting Rich Knowledge from Text
Innovations
– A rich and flexibility representation of textual semantics
– Extract concepts and semantic relations between concepts, rich event structures
– Extract event properties; extend events using event relations
– Handle textual phenomena such as negation and modality
– Mark implicit knowledge and capture meaning suggested by it whenever possible
Four-Layered Representation
Syntax Representation– Syntactically link words in sentences; Apply Word Sense Disambiguation
(WSD)
Semantic Relations– Provide deeper semantic understanding of relations between words
Context Representation– Place boundaries around knowledge that is not universal
Event Representation– Detect events, extract their properties, extend using event relations
Hierarchical Representation
LexicalLevel &Syntax
Gilda_Flores_NN(x1) & _human_NE(x1) & _s_POS(x1,x2) & kidnapping_NN(x2) & occur_VB(e1,x2,x3) & on_IN(e1,x4) &
_date_NE(x4) & time_TMP(BeginFn(x4),1990,1,13,0,0,0) & time_TMP(EndFn(x4),1990,1,13,23,59,59)
he_PRP(x1) & fire_VB(e3,x1,x5) & kidnapper_NN(x5) & _date_NE(x6) &
time_TMP(BeginFn(x6),1990,1,6,0,0,0) & time_TMP(EndFn(x6),1990,1,6,23,59,59)
Gilda Flores’s kidnapping occurred on January 13, 1990.
A week before, he had fired the kidnappers.
InputText
THM_SR(x1,x2) & AGT_SR(x2,e1) & TMP_SR(x4,e1)
AGT_SR(x1,e3) & THM_SR(x5,e3) & TMP_SR(x6,e3)
SemanticRelations
during_TMP(e1,x4) during_TMP(e3,x6)Contexts
event(e2,x2) & THM_EV(x1,e2) & TMP_EV(x4,e2)
event(e4,e3) & AGT_EV(x5,e2) & AGT_EV(x1,e4) & THM_EV(x5,e4)
& TMP_SR(x6,e4)
Events &EventProperties
CAUSE_EV(e4,e2), earlier_TMP(e4,e2)Event Relations
REVENGEMeta-Events
Polaris Semantic Relations
# Semantic Relation Abbr
1 POSSESSION POS
2 KINSHIP KIN
3 PROPERTY-ATTRIBUTE HOLDER PAH
4 AGENT AGT
5 TEMPORAL TMP
6 DEPICTION DPC7 PART-WHOLE PW
8 HYPONYMY ISA
9 ENTAIL ENT
10 CAUSE CAU
11 MAKE-PRODUCE MAK
12 INSTRUMENT INS
13 LOCATION-SPACE LOC
14 PURPOSE PRP
15 SOURCE-FROM SRC
16 TOPIC TPC17 MANNER MNR
18 MEANS MNS
19 ACCOMPANIMENT-COMPANION ACC
20 EXPERIENCER EXP
# Semantic Relation Abbr
21 RECIPIENT REC
22 FREQUENCY FRQ
23 INFLUENCE IFL
24 ASSOCIATED-WITH / OTHER OTH
25 MEASURE MEA
26 SYNONYMY-NAME SYN
27 ANTONYMY ANT
28 PROBABILITY-OF-EXISTENCE PRB
29 POSSIBILITY PSB
30 CERTAINTY CRT
31 THEME-PATIENT THM
32 RESULT RSL
33 STIMULUS STI
34 EXTENT EXT
35 PREDICATE PRD
36 BELIEF BLF
37 GOAL GOL
38 MEANING MNG
39 JUSTIFICATION JST
40 EXPLANATION EXN
Propbank vs. Polaris Relations
Question Propbank Relations Polaris Relations
Who? AGENT, PATIENT, RECIPROCAL, BENEFICIARY
AGENT, EXPERIENCER, THEME, POSSESSION, RECIPIENT, KINSHIP, ACCOMPANIMENT-COMPANION, MAKE-PRODUCE, SYNONYMY, BELIEF
What? AGENT, THEME, TOPIC
AGENT, THEME, TOPIC, POSSESSION, STIMULUS, MAKE-PRODUCE, HYPONYMY, RESULT, BELIEF, PART-WHOLE …
Where? LOCATION, DIRECTION
LOCATION, SOURCE-FROM, PART-WHOLE
When? TEMPORAL, CONDITION
TEMPORAL, FREQUENCY
Why? PURPOSE, CAUSE, PURPOSE-NOT-CAUSE
PURPOSE, CAUSE, INFLUENCE, JUSTIFICATION, GOAL, RESULT, MEANING, EXPLANATION, …
How? MANNER, INSTRUMENT
MANNER, INSTRUMENT, MEANS, …
How much?
EXTENT, DEGREE EXTENT, MEASURE
Possible? CONDITIONAL (?) POSSIBILITY, CERTAINTY, PROBABILITY
Example: Polaris on Treebank
We're talking about years ago before anyone heard of asbestos having any questionable properties.
Treebank Relations Polaris Relations
TMP(talking, years ago before anyone heard of asbestos having any questionable properties)
AGT(talking, We)
TPC(talking, about years ago before anyone heard of asbestos having any questionable properties)
EXP(heard, anyone)
STI(heard, of asbestos having any questionable properties)
AGT(having, asbestos)
THM(having, any questionable properties)
PW(asbestos, any questionable properties)
PAH(properties, questionable)
Propbank Relations
AGT(hear, anyone)
THM(hear, asbestos having any questionable properties)
AGT(talking, we)
THM(talking, years ago before anyone heard of asbestos having any questionable properties)
Hand tagged… Automatically generated! (from Treebank tree)
XWN Knowledge Base (1/2)
WordNet® - free from Princeton University– A large lexical database of English, developed by Professor George
Miller, Princeton Univ; now under the direction of Christiane Fellbaum. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations.
eXtended WordNet - free from UTD– Glosses: parsed; word sense disambiguated; transformed into logic
forms
XWN Knowledge Base - done at LCC– Glosses: converted into semantic relations (using Polaris semantic
parser)– Represented in a Knowledge Base
Reasoning tool Axiom generator Lexical chain facilitator
XWNKnowledge
Base
XWN Knowledge Base (2/2)
Summary: The rich definitional glosses from WordNet are processed through LCC’s Knowledge Acquisition System (Jaguar) to produce a semantically rich upper ontology
The Clusters: Noun glosses are transformed into sets of semantic relations, which are then arranged into individual semantic units called clusters, with one cluster per gloss
The Hierarchy: The clusters (representing one noun synset each) are arranged in a hierarchy similar to that of WordNet
The Knowledge Base: The generated KB has not only the hierarchy of WordNet, but also a rich semantic representation of each entry in the hierarchy (based on the definitional gloss)
Example: WordNet Gloss
Tennis is a game played with rackets by two or four players who hit a ball back and forth over a net that divides the court
ISA (Tennis, game)
AGT (two or four players, play)
THM (game, play)
INS (rackets, play)
MEA (two or four, players)
AGT (two or four players, hit)
THM (a ball, hit)
MNR (back and forth, hit)
LOC (over a net that divides the court, hit)
AGT (a net, divides)
THM (the court, divides)
Semantic Cluster of a WordNet Gloss
tennisISA game
playerMEA two or four
play
AGT player
THM
INS
game
racket
hitAGT player
THM
MNR
ball
back and forth
LOC over a net
divide
AGT net
THM court
Synset ID: 00457626 Name: tennis, lawn_tennis
Hierarchy (as in WordNet)
tennis
basketball
squash
court game
athletic game
outdoor game
golf croquet
Jaguar: Knowledge Extraction
Automatically generate ontologies and structured knowledge bases from text
– Ontologies form the framework or “skeleton” of the knowledge base
– Rich set of semantic relations form the “muscle” that connects concepts in the knowledge base
train
passenger trainfreight train
IS-A IS-A
transport
AGENT
THEME
MEANS
X Corp.
products
freight train
Ontology
Semantic relations
…
Automatically generate ontologies and structured knowledge bases from text
– Ontologies form the framework or “skeleton” of the knowledge base
– Rich set of semantic relations form the “muscle” that connects concepts in the knowledge base
IS-A IS-A
carry
AGENT
conduct
THEME
board
AGENTTHEME
board
MEANS
ship
transport
MEA
NS AGENT
arrive
run
stop
Joined train
passenger trainfreight train
Jaguar : Knowledge Extraction
Automatically Building the Ontology
Jaguar builds an ontology using the following steps
1. Seed words selected either manually or automatically
2. Find sentences in the input documents that contain seed words
3. Parse those sentences and extract semantic relations; focusing on selected relations such as IS-A; Part-Whole; Kinship; Locative; Temporal
4. Integrate the selected semantic relations into the ontology being produced
5. Investigate the noun phrases in the parsed sentences to discover compound nouns, such as “SCUD missile”, and store them in the candidate ontology
6. If desired, revisit the unprocessed sentences to see they contain concepts related to the seed words through other semantic relations.
7. Finally, use the hyponymy information found in Extended WordNet to classify all concepts against one another – detecting and correcting classification errors – building an IS-A hierarchy in the processes
Types of Context
Temporal
– It rained on July 7th
Spatial
– It rained in Dallas
Report
– John said “It rains”
Belief
– John thinks that it rains
Volitional
– John wants it to rain
Planning
– It is scheduled to rain
Conditional
– If it’s cloudy, it will rain
Possibility
– It might rain
Events in Text
Basic Definition: – X is an Event, if X is a possible answer to the question:
What happened?
Applying Definition to Verbs and Nouns– Verb V is an Event if the sentence:
Someone/something V-ed (someone/something)
is an answer to the question “What happened”?
– Noun N is an Event if the sentence: There was/were (a/an) N
is an answer to the question “What happened”?
Events in Text
Most Adjectives are not potential Events
– Verbal 'adjectives' are treated as verbs. eg. 'lost', 'admired'
Factatives ('Light' Verbs) are not separate events
– Suffer-a Loss; Take-a Test; Perform-an Operation
Aspectual Markers Can Combine with a Wide Range of Events
– e.g., Stop, Completion, Start, Continue, Fail, Succeed, Try
Modalities are not separate events
– Possibility, Necessity, Prescription, Suggestion, Optative
Event Detection
Approach for Event Detection
– Annotate WordNet synsets that are Event concepts Annotation completed for Noun and Verb hierarchies
– Detect events by lexical lookup for concepts in annotated WordNet
Project Status
– Prototype implemented for Event detection
– Run Benchmarks Precision: 93%, Recall: 79%
– Currently Tuning Performance
Event Extraction – Future
Event Structures for Modelling Discourse– Aspect (Start, Complete, Continue, Succeed, Fail, Try)
– Modality (Possibility, Necessity, Optativity)
– Event Participants (Actors, Undergoers, Instruments)
– Context (Spatial, Temporal, Intensional)
Event Relations (Causation, Partonomy, Similarity, Contrast)– Event Taxonomy/Classification
– Event Composition
Cogex: Reasoning & Inferencing Over Cogex: Reasoning & Inferencing Over
Extracted KnowledgeExtracted Knowledge
LCC’s Areas of Research
Answer /
Entailment
NL Justification
World K
Axioms
Linguistic
Axioms
Q/T
A/H
Q/T LF
A/H LF
Axioms
Lex Chains
Axiom
Building
Temporal
Axioms
Logic
Forms
XWN
KBase
Semantic
Calculus
ContextSemantic
Parser
Relaxation
Logic
Prover
Answer
Or
Entailment
Ranking
TREC Question Answering Track
TREC Question Answering Track held annual since its inception in TREC-9 (1999)
Main Task TREC-2006 QA Track
– AQUAINT Corpus of English News Text http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002T31 Newswire text data in English, drawn from three sources:
Xinhua News Service (People's Republic of China), New York Times News Service Associated Press Worldstream News Service.
Roughly 3 GBytes of Text; Million+ documents
– Test Set: 75 Sets of Questions organized around a common target; where the target is a Person, Organization, Event or Thing
– Each Series of Question contains 6-9 questions; 4-7 Factoids, 1-2 List, and 1 Other
– Total: 403 Factoid Questions; 89 List Questions; 75 Other Questions
TREC-2006 Question Answering Track
145. Target Event John Williams convicted of Murder
145.1 Factoid How many non-white members of the jury were there?
145.2 Factoid Who was the foreman for the jury
145.3 Factoid Where was the Trial held?
145.4 Factoid When was King convicted?
145.5 Factoid Who was the victim of the murder
145.6 List What defense and prosecution attorneys participated in the trial?
145.7 Other
Textual Entailment Textual Entailment
– Textual Entailment Recognition is a generic task that captures major semantic inference needs across many natural language processing applications, such as Question Answering (QA), Information Retrieval (IR), Information Extraction (IE), and (multi) document summarization.
– Task definition: T entails H, denoted by T → H, if the meaning of H can be inferred from the meaning of T
PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning) RTE (Recognizing Textual Entailment) Challenge – RTE-1 (2004-05); RTE-2 (2005-06) and RTE-3 (2006-07)
– http://www.pascal-network.org/Challenges/RTE/
The Question Answering Task can be interpreted as a Textual Entailment task as follows:– Given a Question Q and a possible Answer Text Passage A, the QA
task is then one of applying semantic inference to the pair (Q, A) to infer whether or not A contains the Answer to Q.
RTE-2: Example TH Pairs
Entailment?: “Yes” T: Tibone estimated diamond production at four mines operated by Debswana –
Botswana’s 50-50 joint venture with DeBeers – could reach 33 million carats this year.
H: Botswana is a business partner of DeBeers.
Entailment?: “Yes”T: The EZLN differs from most revolutionary groups by having stopped military action
after the initial uprising in the first two weeks of 1994.
H: EZLN is a revolutionary group.
Entailment?: “No”T: Two persons were injured in dynamite attacks perpetrated this evening against two
bank branches in this Northwestern Colombian city.
H: Two persons perpetrated dynamite attacks in a Northwestern Colombian city.
Entailment?: “No”T: Such a margin of victory would give Abbas a clear mandate to renew peace talks with
Israel, rein in militants and reform the corruption-riddled Palestinian Authority.
H: The new Palestinian president combated corruption and revived the Palestinian economy.
Semantically Enhanced COGEX
Answer /
Entailment
NL Justification
Q /T
A / H
Q/T LF
A/H LF
Axioms
Lex Chains
Axiom
Building
Temporal
Axioms
Logic
Forms
XWN
KBase
Semantic
Calculus
ContextSemantic
Parser
Relaxation
Logic
Prover
Answer
Or
Entailment
Ranking
Linguistic
Axioms
World K
Axioms
Output of Semantic ParserQuestion: What is the Muslim Brotherhood's goal?
The output of the semantic parser:
PURPOSE(x, Muslim Brotherhood)
Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992.
The output of the semantic parser:
AGENT(Muslim Brotherhood, advocate)
PURPOSE(turning Egypt into a strict Muslim state, advocate)
TEMPORAL(1928, establish)
TEMPORAL(1992, took up arms)
PROPERTY(strict, Muslim state)
MEANS(political means, turning Egypt into a strict Muslim state)
SYNONYMY(Muslim Brotherhood, Egypt's biggest fundamentalist group)
Semantic
Parser
Generation of Logical FormsQuestion: What is the Muslim Brotherhood's goal?
Question Logical Form (QLF): (exists x0 x1 x2 x3 (Muslim_NN(x0) & Brotherhood_NN(x1) & nn_NNC(x2,x0,x1) & PURPOSE_SR(x3,x2))).
Answer: The Muslim Brotherhood, Egypt's biggest fundamentalist group established in 1928, advocates turning Egypt into a strict Muslim state by political means, setting itself apart from militant groups that took up arms in 1992.
Answer Logical Form (AFL): (exists e1 e2 e3 e4 e5 e6 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 (Muslim_NN(x1) & Brotherhood_NN(x2) & nn_NNC(x3,x1,x2) & Egypt_NN(x4) & _s_POS(x5,x4) & biggest_JJ(x5) & fundamentalist_JJ(x5) & group_NN(x5) & SYNONYMY_SR(x3,x5) & establish_VB(e1,x20,x5) & in_IN(e1,x6) & 1928_CD(x6) & TEMPORAL_SR(x6,e1) & advocate_VB(e2,x5,x21) & AGENT_SR(x5,e2) & PURPOSE_SR(e3,e2) & turn_VB(e3,x5,x7) & Egypt_NN(x7) & into_IN(e3,x8) & strict_JJ(x15,x14) & Muslim_NN(x8) & state_NN(x13) & nn_NNC(x14,x8,x13) & PROPERTY_SR(x15,x14) & by_IN(e3,x9) & political_JJ(x9) & means_NN(x9) & MEANS_SR(x9,e3) & set_VB(e5,x5,x5) & itself_PRP(x5) & apart_RB(e5) & from_IN(e5,x10) & militant_JJ(x10) & group_NN(x10) & take_VB(e6,x10,x12) & up_IN(e6,x11) & arms_NN(x11) & in_IN(e6,x12) & 1992_CD(x12) & TEMPORAL_SR(x12,e6)).
ALFLogic
Forms
QLF
Lexical Chains from XWN
Lexical chains– Lexical Chains establish connections between semantically
related concepts, i.e. WordNet synsets. (note concepts, not words which means Word Sense
Disambiguation is necessary) – Concepts and relations along the lexical chain explain the
semantic connectivity of the end concepts – Lexical chains start by using WordNet relations (ISA, Part-
Whole) and gloss co-occurrence (weak relation)– XWN Knowledge Base then adds more meaningful (precise)
relations “Tennis a game played with rackets by two or four
players…” Prior to XWN-KB: ‘tennis’ ‘two or four’ (gloss co-
occurrence) With XWN-KB: ‘tennis’ ‘game’ ‘play’ ‘player’ ‘two
or four’
ISA AGTTHM MEA
Lexical
Chains
XWNKnowledge
Base
Examples of Lexical Chains
Question: How were biological agents acquired by bin Laden?
Answer: On 8 July 1998 , the Italian newspaper Corriere della Serra indicated that members of The World Front for Fighting Jews and Crusaders , which was founded by Bin Laden , purchased three chemical and biological_agent production facilities in
Lexical Chain: ( V - buy#1, purchase#1 ) – HYPERNYM (V - get#1, acquire#1 )
Question: How did Adolf Hitler die?
Answer: … Adolf Hitler committed suicide …
Lexical Chain: ( N - suicide#1, self-destruction#1, self-annihilation#1 ) – GLOSS ( V - kill#1 ) – GLOSS ( V - die#1, decease#1, perish#1, go#17, exit#3, pass_away#1, expire#2, pass#25 )
Propagating syntactic structures along the chain
The goal is to filter out unacceptable chains, and to improve the ranking of chains when multiple chains can be established
Example 1: AGENTQ: Who did Floyd Patterson beat to win the title?
PATIENTWA: He saw Ingemar Johanson knock down Floyd Patterson seven times there in
winning the title.
V - beat#2 – entail V - hit#4 – derivation N - hitting#1,striking#2 – derivation V - strike#2 – hyponym V - knock-down#2
Example 2: AGENT THEME MEASURES1: John bought a cowboy hat for $50.
AGENT MEASURE THEMES2: John paid $50 for a cowboy hat.
V - buy#1 – entail V - pay#1
Axioms on Demand (1/3)
Extract world knowledge, in the form of axioms, from text or other resources automatically and “on demand”– When the logic prover runs out of rules to use, it can request
one from external knowledge sources
Will ask for a rule connecting two concepts
– Generate axioms on the fly from multiple knowledge sources
WordNet and eXtended WordNet: glosses and lexical chains
Instantiation of NLP rules
Open text from a trusted source (dictionary, encyclopedia, textbook on a relevant topic, etc.)
An automatically-built knowledge base
Axioms on Demand (2/3) eXtended WordNet axiom generator
– Question: What all can a ‘player’ do? Look at all contexts with ‘player’ as AGT Gloss of ‘tennis’: a ‘player’ can ‘hit’ (a ball), ‘play’ (a game) Gloss of ‘squash’: A ‘player’ can ‘strike’ (a ball), etc
– Connect related-concepts kidnap_VB(e1,x1,x2) -> kidnapper_NN(x1) (asian_JJ(x1,x2) asia_NN(x1) & _continent_NE(x1))
World Knowledge axioms– WordNet glosses
– jungle_cat_NN(x1) -> small_JJ(x2,x1) & Asiatic_JJ(x3,x1) & wildcat_NN(x1)
NLP axioms– Linguistic rewriting rules
– Gilda_NN(x1) & Flores_NN(x2) & nn_NNC(x3,x1,x2) -> Flores_NN(x3)
XWN
World KAxioms
LinguisticAxioms
Axioms
Axioms on Demand (3/3)
Semantic Relation Calculus– Combine two or more local semantic relations to establish
broader semantic relations
– Increase the semantic connectivity
– Mike is a rich man → Mike is rich ISA_SR(Mike,man) & PAH_SR(man,rich) →PAH_SR(Mike,rich)
– John lives in Dallas, Texas John lives in Texas. LOC(John,Dallas) & PW(Dallas,Texas) -> LOC(John, Texas)
Temporal Axioms– Time Transitivity of Events
during_CTMP(e1,e2) & during_CTMP(e2,e3) during_CTMP(e1,e3)
– Dates entail more general times October 2000 → year 2000
SemanticCalculus
Temporal Axioms
Axioms
Contextual Knowledge Axioms
Examples If someone boards a plane and the flight takes 3 hours,
then that person travels for 3 hours
The person leaves at the same time and arrives at the same time with the traveling plane
If the departure of a vehicle has a destination and the vehicle arrives at the destination then the arrival is located at the destination
If something is exactly located somewhere, then nothing else is exactly located in the same place
If a Process is located in an area, then all sub Processes of the Process are located in the same area
ContextualKnowledge
Axioms
Axioms
Logic Prover (1/2)
A first order logic resolution style theorem prover
Inference rule sets are based on hyperresolution and paramodulation
Transform the two text fragments into 4-layered logic forms based upon LCC’s Syntactic, Semantic, Contextual and Event Processing and Analysis
Automatically create “Axioms on Demand” to be used during the proof– Lexical Chains axioms
– World Knowledge axioms
– Linguistic transformation axioms
– Contextual / Temporal axioms
Logic Prover (2/2)
Load COGEX’s SOS (Set of Support) with Candidate Answer Passage(s) A and Question Q and its USABLE list of clauses with the generated axioms, semantic and temporal axioms
Search for a proof by iteratively removing clauses from SOS and searching the USABLE for possible inferences until a refutation is found– If no contradiction is detected
Relax arguments Drop entire predicates from H
Compute “Proof Score” for each Candidate
Select best Result & Generate NL Justification
Evaluations: QA (TREC-06) LCC’s PowerAnswer Question Answering (QA) system finished 1st on Factoid
Questions and Overall Combined Score. A second LCC QA system, Chaucer, finished 2nd in both categories in the TREC QA 2006 evaluation.
An LCC QA system has finished 1st every year that the TREC QA Evaluation has been conducted (Annually since TREC-8 in 1999)
Mean:
18.5%
Mean:
18.5%
Mea
n
Tea
m 7
Tea
m 6
Tea
m 5
Tea
m 4
Tea
m 2
Tea
m 1
Ch
auce
r
Pow
erA
nsw
er
Tea
m 3
0
10
20
30
40
50
60
70
Fac
toid
Acc
ura
cy
Top Score:
57.8%
Top Score:
57.8%
Evaluations: PASCAL RTE-2 LCC’s Groundhog system finished 1st overall at the Second
PASCAL Recognizing Textual Entailment Challenge (RTE-2) and LCC’s COGEX system finished 2nd. (http://www.pascal-network.org/Challenges/RTE/ )
Gro
undh
og
Cog
ex
Tea
m 1
Tea
m 2
Tea
m 3
Tea
m 4
Tea
m 5
Tea
m 6
Tea
m 7
Tea
m 8
Tea
m 9
Tea
m 1
0
Tea
m 1
150%
55%
60%
65%
70%
75%
80%
Acc
urac
y
Mean:
57.5%
Mean:
57.5%
Best: 75.4%Best: 75.4%
Contact Information
Home Office1701 N. Collins Boulevard
Suite 2000
Richardson, TX 75080
972-231-0052 (Voice)
972-231-0012 (Fax)
Maryland Office6179 Campfire
Columbia, MD 21045
410-715-0777 (Voice)
410-715-0774 (Fax)
443-878-8894 (Cell)