urdf query-time reasoning in uncertain rdf knowledge bases

24
URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald

Upload: conroy

Post on 23-Feb-2016

67 views

Category:

Documents


0 download

DESCRIPTION

URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases . Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald. Information Extraction. YAGO/DBpedia et al. bornOn(Jeff, 09/22/42). gradFrom(Jeff, Columbia). hasAdvisor(Jeff, Arthur). hasAdvisor(Surajit, Jeff). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

URDFQuery-Time Reasoning in Uncertain RDF Knowledge Bases

Ndapandula NakasholeMauro SozioFabian SuchanekMartin Theobald

Page 2: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

bornOn(Jeff, 09/22/42)gradFrom(Jeff, Columbia)hasAdvisor(Jeff, Arthur)hasAdvisor(Surajit, Jeff)knownFor(Jeff, Theory)

type(Jeff, Author)[0.9]author(Jeff, Drag_Book)[0.8]author(Jeff,Cind_Book)[0.6]

worksAt(Jeff, Bell_Labs)[0.7]type(Jeff, CEO)[0.4]

Information ExtractionYAGO/DBpedia et al.

New fact candidates

>120 M facts for YAGO2(mostly from Wikipedia infoboxes)

100’s M additional facts from Wikipedia text

Page 3: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Outline Motivation & Problem Setting

URDF running example: people graduating from universities

Efficient MAP Inference MaxSAT solving with soft & hard constraints

Grounding Deductive grounding of soft rules (SLD resolution) Iterative grounding of hard rules (closure)

MaxSAT Algorithm MaxSAT algorithm in 3 steps

Experiments & Future WorkQuery-Time Reasoning in Uncertain RDF Knowledge Bases

3

Page 4: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

URDF: Uncertain RDF Data Model Extensional Layer (information extraction & integration)

High-confidence facts: existing knowledge base (“ground truth”) New fact candidates: extracted facts with confidence values Integration of different knowledge sources: Ontology merging or explicit Linked Data (owl:sameAs, owl:equivProp.) Large “Uncertain Database” of RDF facts

Intensional Layer (query-time inference) Soft rules: deductive grounding & lineage (Datalog/SLD resolution) Hard rules: consistency constraints (more general FOL rules) Propositional & probabilistic consistency reasoning

Query-Time Reasoning in Uncertain RDF Knowledge Bases

4

Page 5: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Soft Rules vs. Hard Rules(Soft) Deduction Rules vs. (Hard) Consistency Constraints People may live in more than one placelivesIn(x,y) marriedTo(x,z) livesIn(z,y)livesIn(x,y) hasChild(x,z) livesIn(z,y)

People are not born in different places/on different datesbornIn(x,y) bornIn(x,z) y=z

People are not married to more than one person (at the same time, in most countries?)marriedTo(x,y,t1) marriedTo(x,z,t2) y≠z

disjoint(t1,t2)

[0.8] [0.5]

Query-Time Reasoning in Uncertain RDF Knowledge Bases

5

Page 6: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Soft Rules vs. Hard Rules(Soft) Deduction Rules vs. (Hard) Consistency Constraints People may live in more than one placelivesIn(x,y) marriedTo(x,z) livesIn(z,y)livesIn(x,y) hasChild(x,z) livesIn(z,y)

People are not born in different places/on different datesbornIn(x,y) bornIn(x,z) y=z

People are not married to more than one person (at the same time, in most countries?)marriedTo(x,y,t1) marriedTo(x,z,t2) y≠z

disjoint(t1,t2)

[0.8] [0.5]

Query-Time Reasoning in Uncertain RDF Knowledge Bases

6

Rule-based (deductive) reasoning:

Datalog, RDF/S, OWL2-RL, etc.

FOL constraints (in particular

mutex): Datalog with constraints,

X-tuples in Prob. DB’s

owl:FunctionalProperty, etc.

Page 7: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

URDF Running Example

Jeff

Stanford

University

type[1.0]

Surajit

Princeton

David

Computer Scientist

worksAt[0.9]

type[1.0] type[1.0]

type[1.0]type[1.0]

graduatedFrom[0.6]graduatedFrom[0.7]

graduatedFrom[0.9]

hasAdvisor[0.8]hasAdvisor[0.7]

KB: RDF Base Facts

Derived FactsgradFrom(Surajit,Stanfo

rd)gradFrom(David,Stanford

)

graduatedFrom[?]graduatedFrom[?] graduatedFrom[?]

graduatedFrom[?]

First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z)

[0.4]

graduatedFrom(x,y) graduatedFrom(x,z) y=z

Query-Time Reasoning in Uncertain RDF Knowledge Bases

7

Page 8: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Basic Types of Inference Maximum-A-Posteriori (MAP) Inference

Find the most likely assignment to query variables y under a given evidence x.

Compute: arg max y P( y | x) (NP-hard for propositional formulas, e.g., MaxSAT over CNFs)

Marginal/Success Probabilities Probability that query y is true in a random world under a given evidence x. Compute: ∑y P( y | x) (#P-hard for

propositional formulas)

Query-Time Reasoning in Uncertain RDF Knowledge Bases

8

Page 9: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

9 Query-Time Reasoning in Uncertain RDF Knowledge Bases

General Route: Grounding & MaxSAT Solving

Query graduatedFrom(x, y)

CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton))

(graduatedFrom(David, Stanford) graduatedFrom(David, Princeton))

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford))

worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton)

1000

1000

0.4

0.4

0.9 0.8 0.7 0.6 0.7 0.9

1) Grounding– Consider only facts (and

rules) which are relevant for answering the query

2) Propositional formula in CNF, consisting of– Grounded hard & soft rules– Uncertain base facts

3) Propositional Reasoning– Find truth assignment to

facts such that the total weight of the satisfied clauses is maximized

MAP inference: compute “most likely” possible world

Page 10: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Why are high weights for hard rules not enough?

Consider the following CNF (for A,B > 0, A >> B)

The optimal solution has weight A+B The next-best solution has weight A+0 Hence the ratio of the optimal over the approximate

solution is A+B / A

In general, any (1+) approximation algorithm, with > 0, may set graduatedFrom(Surajit, Princeton) to true, as A+B / A 1 for A .

Query-Time Reasoning in Uncertain RDF Knowledge Bases

10

CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton))

graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford)

A

0B

Page 11: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Find: arg max y P( y | x) Resolves to a variant of

MaxSAT for propositional formulas

URDF: MaxSAT Solving with Soft & Hard Rules

Query-Time Reasoning in Uncertain RDF Knowledge Bases

{ graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) }

{ graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) }

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford))

worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) graduatedFrom(David, Princeton)

0.4

0.4

0.9 0.8 0.7 0.6 0.7 0.9

S: M

utex

-con

st.

Special case: Horn-clauses as soft rules & mutex-constraints as hard rules

C: W

eigh

ted

Hor

n cl

ause

s (C

NF)

Compute W0 = ∑clauses C w(C) P(C is satisfied);For each hard constraint S { For each fact f in St { Compute Wf+

t = ∑clauses C w(C) P(C is sat. | f = true); } Compute WS-

t = ∑clauses C w(C) P(C is sat. | St = false); Choose truth assignment to f in St that maximizes Wf+

t , WS-t ;

Remove satisfied clauses C; t++;}

• Runtime: O(|S||C|)

• Approximation guarantee of 1/211

MaxSAT Alg.

Page 12: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Deductive Grounding Algorithm (SLD Resolution/Datalog)

/\

graduatedFrom(Surajit, Princeton)

hasAdvisor(Surajit,Jeff)

worksAt(Jeff,Stanford

)

graduatedFrom(Surajit, Stanford)

Query graduatedFrom(Surajit, y)

First-Order Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z)

[0.4]

graduatedFrom(x,y) graduatedFrom(x,z) y=z

Base FactsgraduatedFrom(Surajit, Princeton)

[0.7]graduatedFrom(Surajit, Stanford)

[0.6]graduatedFrom(David, Princeton)

[0.9]hasAdvisor(Surajit, Jeff) [0.8]hasAdvisor(David, Jeff) [0.7]worksAt(Jeff, Stanford) [0.9]type(Princeton, University) [1.0]type(Stanford, University) [1.0]type(Jeff, Computer_Scientist) [1.0]type(Surajit, Computer_Scientist)

[1.0]type(David, Computer_Scientist)

[1.0]

Query-Time Reasoning in Uncertain RDF Knowledge Bases

12

Grounded Rules hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)

gradFrom(Surajit, Stanford) gradFrom(Surajit, Princeton)

Page 13: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Dependency Graph of a Query SLD grounding always starts from a query

literal and first pursues over the soft deduction rules.

Grounding is also iterated over the hard rules in a top-down fashion by using the literals in each hard rule as new subqueries.

Cycles (due to recursive rules) are detected and resolved via a form of tabling known from Datalog.

Grounding terminates when a closure is reached, i.e., when no new facts can be grounded from the rules and all subgoals are either resolved or form the root of a cycle.

Query-Time Reasoning in Uncertain RDF Knowledge Bases

13

Page 14: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Weighted MaxSAT AlgorithmGeneral ideaCompute a potential function Wt that iterates over all hard rules St and set the fact f St that maximizes Wt (or none of them) to true; set all other facts in St to false.

Query-Time Reasoning in Uncertain RDF Knowledge Bases

14

At iteration 0, we have

At any intermediate iteration t, we compare

At the final iteration t_max, all facts are assigned either true or false.

Wt_max is equal to the total weight of all clauses that are satisfied.

Page 15: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Step 1

Weights w(fi) and probabilities pi

Query-Time Reasoning in Uncertain RDF Knowledge Bases

15

{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }

{ gradFrom(David, Stanford), gradFrom(David, Princeton) }

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4

worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9

S: M

utex

-con

st.

C: W

eigh

ted

Hor

n cl

ause

s (C

NF)

Fact fi w(fi) pi

gradFrom(Surajit, Stanford) 0.7 1.0

gradFrom(Surajit, Princeton) 0.6 0.0

gradFrom(David, Stanford) 0.0 0.0

gradFrom(David, Princeton) 0.9 1.0

worksAt(Jeff, Stanford) 0.9 1.0

hasAdvisor(Surajit, Jeff) 0.8 1.0

hasAdvisor(David, Jeff) 0.7 1.0

Page 16: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Query-Time Reasoning in Uncertain RDF Knowledge Bases

16

Step 2{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }

{ gradFrom(David, Stanford), gradFrom(David, Princeton) }

S: M

utex

-con

st.

C: W

eigh

ted

Hor

n cl

ause

s (C

NF)

Weights w(fi) and probabilities pi

Fact fi w(fi) pi

gradFrom(Surajit, Stanford) 0.7 1.0

gradFrom(Surajit, Princeton) 0.6 0.0

gradFrom(David, Stanford) 0.0 0.0

gradFrom(David, Princeton) 0.9 1.0

worksAt(Jeff, Stanford) 0.9 1.0

hasAdvisor(Surajit, Jeff) 0.8 1.0

hasAdvisor(David, Jeff) 0.7 1.0

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4

worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9

Page 17: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4

worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9

Weights w(fi) and probabilities pi

Fact fi w(fi) pi

gradFrom(Surajit, Stanford) 0.7 1.0

gradFrom(Surajit, Princeton) 0.6 0.0

gradFrom(David, Stanford) 0.0 0.0

gradFrom(David, Princeton) 0.9 1.0

worksAt(Jeff, Stanford) 0.9 1.0

hasAdvisor(Surajit, Jeff) 0.8 1.0

hasAdvisor(David, Jeff) 0.7 1.0Query-Time Reasoning in Uncertain RDF

Knowledge Bases17

Step 2{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }

{ gradFrom(David, Stanford), gradFrom(David, Princeton) }

S: M

utex

-con

st.

C: W

eigh

ted

Hor

n cl

ause

s (C

NF)

C1: hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)

P(C1) = 1 – (1-(1-1))(1-(1-1))(1-1) = 1

single partition, negated: 1 - pi

single partition, negated: 1 - pi

single partition, positive: pi

Page 18: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Query-Time Reasoning in Uncertain RDF Knowledge Bases

18

Step 2{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }

{ gradFrom(David, Stanford), gradFrom(David, Princeton) }

S: M

utex

-con

st.

C: W

eigh

ted

Hor

n cl

ause

s (C

NF)

Weights w(fi) and probabilities pi

P(C1 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-1) = 1P(C2 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-0) = 0 ...

W0 = 0.4 + 0.9 + 0.8 + 0.7 + 0.6 + 0.7 + 0.9 = 5.0

Fact fi w(fi) pi

gradFrom(Surajit, Stanford) 0.7 1.0

gradFrom(Surajit, Princeton) 0.6 0.0

gradFrom(David, Stanford) 0.0 0.0

gradFrom(David, Princeton) 0.9 1.0

worksAt(Jeff, Stanford) 0.9 1.0

hasAdvisor(Surajit, Jeff) 0.8 1.0

hasAdvisor(David, Jeff) 0.7 1.0

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4

worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9

Page 19: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

(hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford)) 0.4

(hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford)) 0.4

worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford) 0.7 gradFrom(David, Princeton) 0.9

Query-Time Reasoning in Uncertain RDF Knowledge Bases

19

Step 3{ gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) }

{ gradFrom(David, Stanford), gradFrom(David, Princeton) }

S: M

utex

-con

st.

C: W

eigh

ted

Hor

n cl

ause

s (C

NF)

Weights w(fi), probabilities pi, truth values

P(C1 is satisfied | f1=true) = 1-(1-(1-1))(1-(1-1))(1-1) = 1P(C1 is satisfied | f2=true) = 1-(1-(1-1))(1-(1-1))(1-0) = 0 ...

W1 = 0.4 + 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.8

W2 = 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.4

Fact fi w(fi) pi

gradFrom(Surajit, Stanford) 0.7 1.0

gradFrom(Surajit, Princeton) 0.6 0.0

gradFrom(David, Stanford) 0.0 0.0

gradFrom(David, Princeton) 0.9 1.0

worksAt(Jeff, Stanford) 0.9 1.0

hasAdvisor(Surajit, Jeff) 0.8 1.0

hasAdvisor(David, Jeff) 0.7 1.0

truefalsefalsetruetruetruetrue

Page 20: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Experiments – Setup YAGO Knowledge Base

2 Mio entities, 20 Mio facts Soft Rules

16 soft rules (hand-crafted deduction rules with weights)

Hard Rules 5 predicates with functional properties (bornIn, diedIn, bornOnDate, diedOnDate, marriedTo)

Queries 10 conjunctive SPARQL queries

Markov Logic as Competitor (based on MCMC) MAP inference: Alchemy employs a form of

MaxWalkSAT MC-SAT: Iterative MaxSAT & Gibbs sampling

Query-Time Reasoning in Uncertain RDF Knowledge Bases

20

Page 21: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

YAGO Knowledge Base: URDF vs. Markov Logic

URDF: SLD grounding & MaxSat solving

|C| - # ground literals in soft rules|S| - # ground literals in hard rules

URDF vs. Markov Logic (MAP inference & MC-SAT)

• First run: ground each query against the rules (SLD grounding + MaxSAT solving) & report sum of runtimes• Asymptotic runtime checks: synthetic soft rule expansions

Query-Time Reasoning in Uncertain RDF Knowledge Bases

21

Page 22: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Recursive Rules & LUBM Benchmark

42 inductively learned (partly recursive) rules over 20 Mio facts in YAGO

URDF grounding with different maximum SLD levels

Query-Time Reasoning in Uncertain RDF Knowledge Bases

22

URDF (SLD grounding + MaxSAT) vs. Jena (only grounding) over the LUBM benchmark SF-1: 103,397 triplets SF-5: 646,128 triplets SF-10: 1,316,993 triplets

Page 23: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Current & Future Topics... Temporal consistency reasoning

Soft/hard rules with temporal predicates Soft deduction rules: deduce confidence distribution of derived facts

Learning soft rules & consistency constraints Explore how Inductive Logic Programming can be applied to

large, uncertain & incomplete knowledge bases

More solving/sampling Linear-time constrained & weighted MaxSAT solver Improved Gibbs sampling with soft & hard rules

Scale-out Distributed grounding via message passing

Updates/versioning for (linked) RDF data Non-monotonic answers for rules with negation!

Query-Time Reasoning in Uncertain RDF Knowledge Bases

23

Page 24: URDF Query-Time Reasoning in  Uncertain RDF Knowledge Bases

Online Demo!

urdf.mpi-inf.mpg.de

Query-Time Reasoning in Uncertain RDF Knowledge Bases

24