learning markov logic network structure via hypergraph lifting

1

LearningMarkov Logic Network Structure

Via Hypergraph Lifting

Stanley KokDept. of Computer Science and Eng.

University of WashingtonSeattle, USA

Joint work with Pedro Domingos

2

Synopsis of LHL

Input: Relational DB

AdvisesPete Sam Pete SaulPaul Sara… …

TAsSam CS1Sam CS2Sara CS1… …

TeachesPete CS1Pete CS2Paul CS2… …

2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s)

1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)

-1.1 TAs(s, c) Æ Advises (s, p)

…

Output: Probabilistic KB

Input: Relational DBAdvises

Pete Sam

Pete Saul

Paul Sar

… …

TAs

Sam CS1

Sam CS2

Sara CS1

… …

Teaches

Pete CS1

Pete CS2

Paul CS2

… …

2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s)

1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)

-1.1 TAs(s, c) ) Advises(s, p)

…

Output: Probabilistic KB

Sam

Pete CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Saul

SueTAs

AdvisesTeaches

PetePaulPatPhilSamSaraSaulSue

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

Teaches

TAs

Advises

Professor

Student

Course

Goal of LHL

3

Area under Prec Recall Curve (AUC)

Conditional Log-Likelihood (CLL)

LHL

BUSL MSLLHL BUSL MSL

Experimental Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-0.39

-0.37

-0.35

-0.33

-0.31

-0.29

-0.27

-0.25

LHL LHL

BUSL MSLBUSL MSL

Background Learning via Hypergraph Lifting Experiments Future Work


4

Outline

5

Markov Logic A logical KB is a set of hard constraints

on the set of possible worlds Let’s make them soft constraints:

When a world violates a formula,it becomes less probable, not impossible

Give each formula a weight(Higher weight Stronger constraint)

6

Markov Logic A Markov logic network (MLN) is a set of

pairs (F,w) F is a formula in first-order logic w is a real number

vector of truth assignments to ground atoms

partition function

weight ofith formula

#true groundingsof ith formula

P (x) = 1Z exp

³ P Fi=1 wi ni

´

Challenging task Few approaches to date

[Kok & Domingos, ICML’05; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08]

Most MLN structure learners Greedily and systematically enumerate formulas Computationally expensive; large search space Susceptible to local optima

7

MLN Structure Learning

While beam not empty Add unit clauses to beam While beam has changed For each clause c in beam

c’ Ã add a literal to c newClauses Ã newClauses [ c’ beam Ã k best clauses in beam [ newClauses Add best clause in beam to MLN

8

MSL [Kok & Domingos, ICML’05]

Find paths of linked ground atoms !formulas Path ´ conjunction that is true at least once Exponential search space of paths Restricted to short paths

9

Relational Pathfinding [Richards & Mooney, AAAI’92]

Sam

Pete CS1CS2CS3CS4CS5CS6CS7CS8

PaulPatPhil

SaraSaulSue

Teaches

TAs

Advises

Pete CS1

Sam

Advises(Pete, Sam) Æ Teaches(Pete, CS1) Æ TAs(Sam, CS1) Advises( p , s ) Æ Teaches( p , c ) Æ TAs( s , c )

Find short paths with a form of relational pathfinding Path ! Boolean variable ! Node in Markov network Greedily tries to link the nodes with edges Cliques ! clauses

Form disjunctions of atoms in clique’s nodes Greedily adds clauses to an empty MLN

10

BUSL[Mihalkova & Mooney, ICML’07]

Advises( p,s) Æ

Teaches(p,c)

TAs(s,c)

… Advises(p,s) V Teaches(p,c) V TAs(s,c) :Advises(p,s) V : Teaches(p,c) V TAs(s,c)

…


11

Outline

Uses relational pathfinding to fuller extent Induces a hypergraph over clusters of constants

12

Learning viaHypergraph Lifting (LHL)

Sam

Pete CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Saul

Sue

Teaches

TAs

Advises

PetePaulPatPhil

SamSaraSaulSue

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Teaches

TAs

Advises

“Lift”

Uses a hypergraph (V,E) V : set of nodes E : set of labeled, non-empty, ordered subsets of V

Find paths in a hypergraph Path: set of hyperedges s.t. for any two e0

and en, 9 sequence of hyperedges in set that leads from e0 Ã en

13


Relational DB can be viewed as hypergraph Nodes ´ Constants Hyperedges ´ True ground atoms

14

DBAdvises

Pete Sam Pete SaulPaul Sara

… …TAs

Sam CS1Sam CS2Sara CS1

… …

TeachesPete CS1

Pete CS2Paul CS2

… …


Sam


PaulPatPhil

SaraSaulSue

TAs

AdvisesTeaches

LHL “lifts” hypergraph into more compact rep. Jointly clusters nodes into higher-level concepts Clusters hyperedges

Traces paths in lifted hypergraph

15

LHL = Clustering + Relational Pathfinding

Sam


PaulPatPhil

SaraSaulSue

Teaches

TAs

Advises

PetePaulPatPhilSamSaraSaulSue


Teaches

TAs

Advises

“Lift”

LHL has three components LiftGraph: Lifts hypergraph

FindPaths: Finds paths in lifted hypergraph

CreateMLN: Creates rules from paths, and

adds good ones to empty MLN 16

Learning via Hypergraph Lifting

Defined using Markov logic Jointly clusters constants in bottom-up

agglomerative manner Allows information to propagate from one

cluster to another Ground atoms also clustered #Clusters need not be specified in advance Each lifted hyperedge contains ¸ one true

ground atom17

LiftGraph

Find cluster assignment C that maxmizes posterior prob. P(C | D) / P(D | C) P(C)

18

Learning Problem in LiftGraph

Truth values ofground atoms Defined with

an MLNDefined withanother MLN

For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule

19

LiftGraph’s P(D|C) MLN

8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)

PetePaulPatPhil

Professor

Student

SamSaraSaulSue

Teaches

TAs

Advises


CoursePetePaulPatPhil

ProfessorCS1 CS2CS3 CS4CS5 CS6CS7 CS8

Course

Teaches

For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule


Course

20



PetePaulPatPhil

Professor

Teaches Teaches(p,c)p 2 Æ c 2 )

For each predicate r, we have a default atom prediction rule

21



PetePaulPatPhil

ProfessorCS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Course

SamSaraSaulSue

x 2

Æ y 2x 2


PetePaulPatPhil

Professor

Æ y 2

…

DefaultClusterCombination

) Teaches(x,y)

Student

Each symbol belongs to exactly one cluster Infinite weight

Exponential prior on #cluster combinations Negative weight -¸

22

1+ 11+ 1

LiftGraph’s P(C) MLN

Hard assignments of constants to clusters Weights and log-posterior computed in closed

form Searches for cluster assignment with highest

log-posterior

23

LiftGraph

24

LiftGraph’s Search Algm

Pete

Paul

CS1

CS2

CS3

Sam

Sara

Teaches

Advises

Pete

Paul

PetePaul

25

LiftGraph’s Search AlgmCS1

CS2

CS3

Sam

Sara

Teaches

Advises

PetePaul

CS1

CS2

CS1CS2

CS3

CS1CS2CS3

Sam

Sara

SamSara

Teaches

Advises

26

FindPaths

1+ 11+ 1

PetePaulPatPhil

SamSaraSaulSue


Teaches

TAs

Advises

Paths Found

PetePaulPatPhil

SamSaraSaulSue


Advises( , )

Advises( , ) ,Teaches ( , )

Advises( , ) ,Teaches ( , ),TAs( , )

1+ 11+ 1

Advises( , ) , PetePaulPatPhil

SamSaraSaulSue

Teaches( , ) , CS1 CS2CS3 CS4CS5 CS6CS7 CS8

PetePaulPatPhil

TAs( , )

SamSaraSaulSue


:Advises(p, s) V :Teaches(p, c) V :TAs(s, c)

27

Clause Creation

1+ 11+ 1

1+ 11+ 1

Advises( , ) PetePaulPatPhil

SamSaraSaulSue

Teaches( , )


PetePaulPatPhil

TAs( , )

SamSaraSaulSue


Æ

Æ

1+ 11+ 1

Advises( , )

Teaches( , )

TAs( , )

Æ

Æ

p

p

s

s

c

c

Advises(p, s) Æ Teaches(p, c) Æ TAs(s, c) Advises(p, s) V :Teaches(p, c) V :TAs(s, c) Advises(p, s) V Teaches(p, c) V :TAs(s, c) …

28

Clause Pruning

1+ 11+ 1

: Advises(p, s) V :Teaches(p, c) V TAs(s, c)

Advises(p, s) V :Teaches(p, c) V TAs(s, c)…: Advises(p, s) V :Teaches(p, c): Advises(p, s) V TAs(s, c)

… : Advises(p, s) : Teaches(p, c)

:Teaches(p, c) V TAs(s, c)

TAs(s, c)

Score -1.15 -1.17

-2.21 -2.23 -2.03

-3.13 -2.93 -3.93

…

…`

29

Clause Pruning

1+ 11+ 1

: Advises(p, s) V :Teaches(p, c) V TAs(s, c)

Advises(p, s) V :Teaches(p, c) V TAs(s, c)…: Advises(p, s) V :Teaches(p, c): Advises(p, s) V TAs(s, c)

… : Advises(p, s) : Teaches(p, c)

:Teaches(p, c) V TAs(s, c)

TAs(s, c)

Score -1.15 -1.17

-2.21 -2.23 -2.03

-3.13 -2.93 -3.93

…

…

Compare each clause against its sub-clauses (taken individually)

Add clauses to empty MLN in order of decreasing score

Retrain weights of clauses each time clause is added

Retain clause in MLN if overall score improves

30

1+ 11+ 1

MLN Creation


31

Outline

IMDB Created from IMDB.com DB Movies, actors, etc., and relationships 17,793 ground atoms; 1224 true ones

UW-CSE Describes academic department Students, faculty, etc., and relationships 260,254 ground atoms; 2112 true ones

32

Datasets

Cora Citations to computer science papers Papers, authors, titles, etc., and their

relationships 687,422 ground atoms; 42,558 true ones

33

Datasets

Five-fold cross validation Inferred prob. true for groundings of each

predicate Groundings of all other predicates as evidence

Evaluation measures Area under precision-recall curve (AUC) Average conditional log-likelihood (CLL)

34

Methodology

MCMC inference algms in Alchemy to evaluate the test atoms 1 million samples 24 hours

35

Methodology

Compared with MSL [Kok & Domingos, ICML’05] BUSL [Mihalkova & Mooney, ICML’07]

Lesion study NoLiftGraph: LHL with no hypergraph lifting

Find paths directly from unlifted hypergraph NoPathFinding: LHL with no pathfinding

Use MLN representing LiftGraph36

Methodology

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

0.17

0.19

0.21

0.23

37

LHL vs. BUSL vs. MSLArea under Prec-Recall Curve

LHL BUSL MSL

IMDB UW-CSE

Cora

LHL BUSL MSL

LHL BUSL MSL

-0.39

-0.35

-0.31

-0.27

-0.57

-0.47

-0.37

-0.27

-0.17

-0.07

-0.18

-0.17

-0.16

-0.15

-0.14

-0.13

-0.12

LHL vs. BUSL vs. MSLConditional Log-likelihood

IMDB UW-CSE

Cora

LHL BUSL MSL LHL BUSL MSL

LHL BUSL MSL

0

4

8

12

16

39

LHL vs. BUSL vs. MSLRuntime

0

20

40

60

0

4

8

12

UW-CSEIMDB

Cora

min hr

hr

LHL BUSL MSL LHL BUSL MSL

LHL BUSL MSL

40

LHL vs. NoLiftGraphArea under Prec-Recall Curve

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1IMDB UW-CSE

Cora

NoLiftGraph

LHLNoLiftGraphLHL

NoLiftGraph

LHL

-0.39

-0.34

-0.29

-0.24

-0.19

-0.05

-0.04

-0.03

-0.135

-0.13

-0.125

-0.12

41

LHL vs. NoLiftGraphConditional Log-likelihood

IMDB UW-CSE

Cora

NoLiftGraphLHL

NoLiftGraphLHL

NoLiftGraphLHL

42

0

50

100

150

200

250

0100020003000400050006000

0

50

100

150

LHL vs. NoLiftGraphRuntime

IMDB UW-CSE

Cora

min

hr

hr

NoLiftGraphLHL

NoLiftGraphLHL

NoLiftGraphLHL

43

LHL vs. NoPathFinding

00.20.40.60.8

1

0

0.1

0.2

0.3

-0.28

-0.23

-0.18

-0.13

-0.0700000000000001

-0.0600000000000001

-0.0500000000000001

-0.0400000000000001

-0.03

AU

C

AU

C

CLL CLL

IMDB UW-CSE

NoPathFindingLHL

NoPathFindingLHL

NoPathFindingLHL NoPath

FindingLHL

if a is an actor and d is a director,and they both worked in the same movie,then a probably worked under d

if p is a professor, and p co-authored a paper with s, then s is likely a student

if papers x and y have same authorthen x and y are likely to be same paper 44

Examples of Rules Learned

Motivation Background Learning via Hypergraph Lifting Experiments Future Work

45

Outline

Integrate the components of LHL Integrate LHL with lifted inference [Singla &

Domingos, AAAI’08] Construct ontology simultaneously with

probabilistic KB Further scale LHL up Apply LHL to larger, richer domains

e.g., the Web

46

Future Work

LHL = Clustering + Relational Pathfinding“Lifts” data into more compact form

Essential for speeding up relational pathfindingLHL outperforms state-of-the-art structure

learners

47

Conclusion

learning markov logic network structure via hypergraph lifting

Documents

beam c

c tass

clause c

c advisesp

c newclauses newclauses

c advisess

c tas s

beam newclauses