learning to “ read between the lines ” using bayesian logic programs

Learning to “Read Between the Lines” using Bayesian Logic Programs

Sindhu Raghavan, Raymond Mooney, and Hyeonseo Ku

The University of Texas at AustinJuly 2012

1

Information Extraction• Information extraction (IE) systems extract factual

information that occurs in text [Cowie and Lenhert, 1996; Sarawagi, 2008]

• Natural language text is typically “incomplete”– Commonsense information is not explicitly stated– Easily inferred facts are omitted from the text

• Human readers use commonsense knowledge and “read between the lines” to infer implicit information

• IE systems have no access to commonsense knowledge and hence cannot infer implicit information

2

Example

Natural language text“Barack Obama is the President of the United States of America.”

Query“Barack Obama is the citizen of what country?”

IE systems cannot answer this query since citizenship information is not explicitly stated!

3

Objective

• Infer implicit facts from explicitly stated information– Extract explicitly stated facts using an IE system– Learn common sense knowledge in the form of

logical rules to deduce additional facts– Employ models from statistical relational

learning (SRL) that allow probabilities to be estimated using well-founded probabilistic graphical models

4

Related Work

•Learning propositional rules [Nahm and Mooney, 2000]

– Learn propositional rules from the output of an IE system on computer-related job postings

– Perform logical deduction to infer new facts– Purely logical deduction is brittle

• Cannot assign probabilities or confidence estimates to inferences

5

Related Work• Learning first-order rules

– Logical deduction using probabilistic rules [Carlson et al., 2010; Doppa et al., 2010]

• Modify existing rule learners like FOIL and FARMER to learn probabilistic rules

• Probabilities are not computed using well-founded probabilistic graphical models

– Use Markov Logic Networks (MLNs) [Domingos and

Lowd, 2009] based approaches to infer additional facts [Schoenmackers et al., 2010; Sorower et al., 2011]

• Grounding process could result in intractably large networks for large domains

6

Related Work

• Learning for Textual Entailment [Lin and Pantel, 2001; Yates and Etzioni, 2007; Berant et al., 2011]

– Textual entailment rules have a single antecedent in the body of the rule

– Approaches from statistical relational learning have not been applied so far

– Do not use extractions from a traditional IE system to learn rules

7

Our Approach

• Use an off-the shelf IE system to extract facts

• Learn commonsense knowledge from the extracted facts in the form of probabilistic first-order-rules

• Infer additional facts based on the learned rules using Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001]

8

System ArchitectureTraining

DocumentsInformation Extractor

(IBM SIRE)Extracted

Facts

Inductive LogicProgramming

(LIME)

First-OrderLogical Rules

BLP Weight Learner(version of EM)

Bayesian LogicProgram (BLP)

BLP InferenceEngine

TestDocument

Extractions

Inferences withprobabilities 9

.

.

.

.

.

.

Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA.

.

.

.

.

.

.

nationState(USA)Person(BarackObama)isLedBy(USA,BarackObama)hasBirthPlace(BarackObama,USA)hasCitizenship(BarackObama,USA)

nationState(B) ∧ isLedBy(B,A) hasCitizenship(A,B)nationState(B) ∧ employs(B,A) hasCitizenship(A,B)

hasCitizenship(A,B) | nationState(B) , isLedBy(B,A) .9hasCitizenship(A,B) | nationState(B) , employs(B,A) .6

nationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)employs(malaysian,mahatir-mohamad)

hasCitizenship(mahathir-mohamad, malaysian) 0.75

Bayesian Logic Programs[Kersting and De Raedt, 2001]

• Set of Bayesian clauses a | a1,a2,....,an– Definite clauses in first-order logic, universally quantified– Head of the clause - a– Body of the clause - a1, a2, …, an – Associated conditional probability table (CPT)

• P(head | body) • Bayesian predicates a, a1, a2, …, an have finite

domains– Combining rule like noisy-or for mapping multiple CPTs

into a single CPT• Given a set of Bayesian clauses and a query, SLD

resolution is used to construct ground Bayesian networks for probabilistic inference

10

Why BLPs?

• Pure logical deduction is brittle and results in many undifferentiated inferences

• Inference in BLPs is probabilistic, i.e. inferences are assigned probabilities– Probabilities can be used to select only high-

confidence inferences

• Efficient grounding mechanism in BLPs enables our approach to scale

11

Inductive Logic Programming (ILP) for learning first-order rules

ILP Rule Learner

ILP Rule Learner

Target relationhasCitizenship(X,Y)

Positive instanceshasCitizenship(BarackObama, USA)

hasCitizenship(GeorgeBush, USA)

hasCitizenship(IndiraGandhi,India)

.

.

Negative instanceshasCitizenship(BarackObama, India)

hasCitizenship(GeorgeBush, India)

hasCitizenship(IndiraGandhi,USA)

.

.

KBhasBirthPlace(BarackObama,USA)person(BarackObama)nationState(USA)nationState(India)

.

.

RulesnationState(Y) ∧ isLedBy(Y,X) hasCitizenship(X,Y)

..

RulesnationState(Y) ∧ isLedBy(Y,X) hasCitizenship(X,Y)

..

Generated using clo

sed-

world assu

mption

Generated using clo

sed-

world assu

mption

12

Inference using BLPs

Test document“Malaysian Prime Minister Mahathir Mohamad Wednesday announced for the first time that he has appointed his deputy Abdullah Ahmad Badawi as his successor.”

Extracted factsnationState(malaysian)Person(mahathir-mohamad)isLedBy(malaysian,mahathir-mohamad)employs(malaysian,mahatir-mohamad)

Learned rulesnationState(B) ∧ isLedBy(B,A) hasCitizenship(A,B)nationState(B) ∧ employs(B,A) hasCitizenship(A,B)

13

Logical Inference in BLPs

Rule 1nationState(B) ∧ isLedBy(B,A) hasCitizenship(A,B)

nationState(malaysian) isLedBy(malaysian,mahathir-mohamad)

hasCitizenship(mahathir-mohamad, malaysian)

14

Logical Inference in BLPs

Rule 2nationState(B) ∧ employs(B,A) hasCitizenship(A,B)

nationState(malaysian) employs(malaysian,mahathir-mohamad)

hasCitizenship(mahathir-mohamad, malaysian)

15

Probabilistic inference in BLPs

nationState(malaysian)

isLedBy(malaysian, mahathir-mohamad)

- - -

- - -

- - -

- - -

Logical

And

employs(malaysian, mahathir-mohamad)

dummy1 dummy2

hasCitizenship(mahathir-mohamad,

malaysian)Marginal Probability ??

- - -

- - -

- - -

- - -

Logical

And- - -

- - -

- - -

- - -

Noisy

Or

16

Sample rules learnedgovernmentOrganization(A) ∧ employs(A,B) hasMember(A,B)

eventLocation(A,B) ∧ bombing(A) thingPhysicallyDamage(A,B)

isLedBy(A,B) hasMemberPerson(A,B)

17

Experimental Evaluation

• Data– DARPA’s intelligence community (IC) data set

from the Machine Reading Project (MRP)– Consists of news articles on politics,

terrorism, and other international events– 10,000 documents in total

• Perform 10-fold cross validation

18


• Learning first-order rules using LIME [McCreath and Sharma, 1998]

– Learn rules for 13 target relations– Learn rules using both positive and negative

instances and using only positive instances– Include all unique rules learned from different

models

• Learning BLP parameters– Learn noisy-or parameters using Expectation

Maximization (EM)– Set priors to maximum likelihood estimates

19


• Performance evaluation– Manually evaluated inferred facts from 40

documents, randomly selected from each test set– Compute two precision scores

• Unadjusted (UA) – does not account for extractor’s mistakes

• Adjusted (AD) – account for extractor’s mistakes

– Rank inferences using marginal probabilities and evaluate top-n

20


• Systems compared– BLP Learned Weights

• Noisy-or parameters learned using online EM– BLP Manual Weights

• Noisy-or parameters set to 0.9– Logical Deduction– MLN Learned Weights

• Learn weights using generative online weight learner– MLN Manual Weights

• Assign a weight of 10 to all rules and MLE priors to all predicates

21

Unadjusted Precision

22

Adjusted Precision

23

Future Work

• Improve the performance of weight learning for BLPs and MLNs– Learn parameters on larger data sets

• Improve performance of MLNs– Use open-world assumption for learning– Add constraints required to prevent inference of facts

like employs(a,a)– Specialize types that do not have strictly defined types

• Develop an online rule learner that can learn rules from uncertain training data

24

Conclusions• Efficient learning of probabilistic first-order

rules that represent common sense knowledge using extractions from an IE system

• Inference of implicitly stated facts with high precision using BLPs

• Superior performance of BLPs over purely logical deduction and MLNs

25

Questions??

26

Back Up

27

Results for Logical Deduction

UA AD

Precision 29.73 (443/1490)

35.24 (443/1257)

28


• Learning BLP parameters– Use logical-and model to combine evidence

from the conjuncts in the body of the clause– Use noisy-or model to combine evidence from

several ground rules that have the same head– Learn noisy-or parameters using Expectation

Maximization (EM)– Set priors to maximum likelihood estimates

29

learning to “ read between the lines ” using bayesian logic programs

Documents

probabilistic rules

form of logical rules

learned rules

textual entailment rules

form of probabilistic

extracted facts

additional facts schoenmackers

objectiveinfer implicit