learning markov logic network structure via hypergraph lifting

47
1 Learning Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos

Upload: damien

Post on 25-Feb-2016

82 views

Category:

Documents


0 download

DESCRIPTION

Learning Markov Logic Network Structure Via Hypergraph Lifting. Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos. Goal of LHL. Synopsis of LHL. Teaches. Professor. Course. Pete. Advises. Input : Relational DB. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

1

LearningMarkov Logic Network Structure

Via Hypergraph Lifting

Stanley KokDept. of Computer Science and Eng.

University of WashingtonSeattle, USA

Joint work with Pedro Domingos

Page 2: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

2

Synopsis of LHL

Input: Relational DB

AdvisesPete Sam Pete SaulPaul Sara… …

TAsSam CS1Sam CS2Sara CS1… …

TeachesPete CS1Pete CS2Paul CS2… …

2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s)

1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)

-1.1 TAs(s, c) Æ Advises (s, p)

Output: Probabilistic KB

Input: Relational DBAdvises

Pete Sam

Pete Saul

Paul Sar

… …

TAs

Sam CS1

Sam CS2

Sara CS1

… …

Teaches

Pete CS1

Pete CS2

Paul CS2

… …

2.7 Teaches(p, c) Æ TAs(s, c) ) Advises(p, s)

1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)

-1.1 TAs(s, c) ) Advises(s, p)

Output: Probabilistic KB

Sam

Pete CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Saul

SueTAs

AdvisesTeaches

PetePaulPatPhilSamSaraSaulSue

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

Teaches

TAs

Advises

Professor

Student

Course

Goal of LHL

Page 3: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

3

Area under Prec Recall Curve (AUC)

Conditional Log-Likelihood (CLL)

LHL

BUSL MSLLHL BUSL MSL

Experimental Results

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-0.39

-0.37

-0.35

-0.33

-0.31

-0.29

-0.27

-0.25

LHL LHL

BUSL MSLBUSL MSL

Page 4: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Background Learning via Hypergraph Lifting Experiments Future Work

Background Learning via Hypergraph Lifting Experiments Future Work

4

Outline

Page 5: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

5

Markov Logic A logical KB is a set of hard constraints

on the set of possible worlds Let’s make them soft constraints:

When a world violates a formula,it becomes less probable, not impossible

Give each formula a weight(Higher weight Stronger constraint)

Page 6: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

6

Markov Logic A Markov logic network (MLN) is a set of

pairs (F,w) F is a formula in first-order logic w is a real number

vector of truth assignments to ground atoms

partition function

weight ofith formula

#true groundingsof ith formula

P (x) = 1Z exp

³ P Fi=1 wi ni

´

Page 7: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Challenging task Few approaches to date

[Kok & Domingos, ICML’05; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08]

Most MLN structure learners Greedily and systematically enumerate formulas Computationally expensive; large search space Susceptible to local optima

7

MLN Structure Learning

Page 8: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

While beam not empty Add unit clauses to beam While beam has changed For each clause c in beam

c’ à add a literal to c newClauses à newClauses [ c’ beam à k best clauses in beam [ newClauses Add best clause in beam to MLN

8

MSL [Kok & Domingos, ICML’05]

Page 9: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Find paths of linked ground atoms !formulas Path ´ conjunction that is true at least once Exponential search space of paths Restricted to short paths

9

Relational Pathfinding [Richards & Mooney, AAAI’92]

Sam

Pete CS1CS2CS3CS4CS5CS6CS7CS8

PaulPatPhil

SaraSaulSue

Teaches

TAs

Advises

Pete CS1

Sam

Advises(Pete, Sam) Æ Teaches(Pete, CS1) Æ TAs(Sam, CS1) Advises( p , s ) Æ Teaches( p , c ) Æ TAs( s , c )

Page 10: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Find short paths with a form of relational pathfinding Path ! Boolean variable ! Node in Markov network Greedily tries to link the nodes with edges Cliques ! clauses

Form disjunctions of atoms in clique’s nodes Greedily adds clauses to an empty MLN

10

BUSL[Mihalkova & Mooney, ICML’07]

Advises( p,s) Æ

Teaches(p,c)

TAs(s,c)

… Advises(p,s) V Teaches(p,c) V TAs(s,c) :Advises(p,s) V : Teaches(p,c) V TAs(s,c)

Page 11: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Background Learning via Hypergraph Lifting Experiments Future Work

11

Outline

Page 12: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Uses relational pathfinding to fuller extent Induces a hypergraph over clusters of constants

12

Learning viaHypergraph Lifting (LHL)

Sam

Pete CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Saul

Sue

Teaches

TAs

Advises

PetePaulPatPhil

SamSaraSaulSue

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Teaches

TAs

Advises

“Lift”

Page 13: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Uses a hypergraph (V,E) V : set of nodes E : set of labeled, non-empty, ordered subsets of V

Find paths in a hypergraph Path: set of hyperedges s.t. for any two e0

and en, 9 sequence of hyperedges in set that leads from e0 Ã en

13

Learning viaHypergraph Lifting (LHL)

Page 14: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Relational DB can be viewed as hypergraph Nodes ´ Constants Hyperedges ´ True ground atoms

14

DBAdvises

Pete Sam Pete SaulPaul Sara

… …TAs

Sam CS1Sam CS2Sara CS1

… …

TeachesPete CS1

Pete CS2Paul CS2

… …

Learning viaHypergraph Lifting (LHL)

Sam

Pete CS1CS2CS3CS4CS5CS6CS7CS8

PaulPatPhil

SaraSaulSue

TAs

AdvisesTeaches

Page 15: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

LHL “lifts” hypergraph into more compact rep. Jointly clusters nodes into higher-level concepts Clusters hyperedges

Traces paths in lifted hypergraph

15

LHL = Clustering + Relational Pathfinding

Sam

Pete CS1CS2CS3CS4CS5CS6CS7CS8

PaulPatPhil

SaraSaulSue

Teaches

TAs

Advises

PetePaulPatPhilSamSaraSaulSue

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

Teaches

TAs

Advises

“Lift”

Page 16: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

LHL has three components LiftGraph: Lifts hypergraph

FindPaths: Finds paths in lifted hypergraph

CreateMLN: Creates rules from paths, and

adds good ones to empty MLN 16

Learning via Hypergraph Lifting

Page 17: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Defined using Markov logic Jointly clusters constants in bottom-up

agglomerative manner Allows information to propagate from one

cluster to another Ground atoms also clustered #Clusters need not be specified in advance Each lifted hyperedge contains ¸ one true

ground atom17

LiftGraph

Page 18: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Find cluster assignment C that maxmizes posterior prob. P(C | D) / P(D | C) P(C)

18

Learning Problem in LiftGraph

Truth values ofground atoms Defined with

an MLNDefined withanother MLN

Page 19: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule

19

LiftGraph’s P(D|C) MLN

8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)

PetePaulPatPhil

Professor

Student

SamSaraSaulSue

Teaches

TAs

Advises

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

CoursePetePaulPatPhil

ProfessorCS1 CS2CS3 CS4CS5 CS6CS7 CS8

Course

Teaches

Page 20: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

For each predicate r and each cluster combination containing a true ground atom of r, we have an atom prediction rule

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

Course

20

LiftGraph’s P(D|C) MLN

8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)

PetePaulPatPhil

Professor

Teaches Teaches(p,c)p 2 Æ c 2 )

Page 21: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

For each predicate r, we have a default atom prediction rule

21

LiftGraph’s P(D|C) MLN

8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)

PetePaulPatPhil

ProfessorCS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Course

SamSaraSaulSue

x 2

Æ y 2x 2

8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)8x1; : : :;xn x1 2 °1 ^:::^xn 2 °n ) r(x1; : : : ;xn)

PetePaulPatPhil

Professor

Æ y 2

DefaultClusterCombination

) Teaches(x,y)

Student

Page 22: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Each symbol belongs to exactly one cluster Infinite weight

Exponential prior on #cluster combinations Negative weight -¸

22

1+ 11+ 1

LiftGraph’s P(C) MLN

Page 23: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Hard assignments of constants to clusters Weights and log-posterior computed in closed

form Searches for cluster assignment with highest

log-posterior

23

LiftGraph

Page 24: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

24

LiftGraph’s Search Algm

Pete

Paul

CS1

CS2

CS3

Sam

Sara

Teaches

Advises

Pete

Paul

PetePaul

Page 25: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

25

LiftGraph’s Search AlgmCS1

CS2

CS3

Sam

Sara

Teaches

Advises

PetePaul

CS1

CS2

CS1CS2

CS3

CS1CS2CS3

Sam

Sara

SamSara

Teaches

Advises

Page 26: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

26

FindPaths

1+ 11+ 1

PetePaulPatPhil

SamSaraSaulSue

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

Teaches

TAs

Advises

Paths Found

PetePaulPatPhil

SamSaraSaulSue

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

Advises( , )

Advises( , ) ,Teaches ( , )

Advises( , ) ,Teaches ( , ),TAs( , )

Page 27: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

1+ 11+ 1

Advises( , ) , PetePaulPatPhil

SamSaraSaulSue

Teaches( , ) , CS1 CS2CS3 CS4CS5 CS6CS7 CS8

PetePaulPatPhil

TAs( , )

SamSaraSaulSue

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

:Advises(p, s) V :Teaches(p, c) V :TAs(s, c)

27

Clause Creation

1+ 11+ 1

1+ 11+ 1

Advises( , ) PetePaulPatPhil

SamSaraSaulSue

Teaches( , )

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

PetePaulPatPhil

TAs( , )

SamSaraSaulSue

CS1 CS2CS3 CS4CS5 CS6CS7 CS8

Æ

Æ

1+ 11+ 1

Advises( , )

Teaches( , )

TAs( , )

Æ

Æ

p

p

s

s

c

c

Advises(p, s) Æ Teaches(p, c) Æ TAs(s, c) Advises(p, s) V :Teaches(p, c) V :TAs(s, c) Advises(p, s) V Teaches(p, c) V :TAs(s, c) …

Page 28: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

28

Clause Pruning

1+ 11+ 1

: Advises(p, s) V :Teaches(p, c) V TAs(s, c)

Advises(p, s) V :Teaches(p, c) V TAs(s, c)…: Advises(p, s) V :Teaches(p, c): Advises(p, s) V TAs(s, c)

… : Advises(p, s) : Teaches(p, c)

:Teaches(p, c) V TAs(s, c)

TAs(s, c)

Score -1.15 -1.17

-2.21 -2.23 -2.03

-3.13 -2.93 -3.93

…`

Page 29: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

29

Clause Pruning

1+ 11+ 1

: Advises(p, s) V :Teaches(p, c) V TAs(s, c)

Advises(p, s) V :Teaches(p, c) V TAs(s, c)…: Advises(p, s) V :Teaches(p, c): Advises(p, s) V TAs(s, c)

… : Advises(p, s) : Teaches(p, c)

:Teaches(p, c) V TAs(s, c)

TAs(s, c)

Score -1.15 -1.17

-2.21 -2.23 -2.03

-3.13 -2.93 -3.93

Compare each clause against its sub-clauses (taken individually)

Page 30: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Add clauses to empty MLN in order of decreasing score

Retrain weights of clauses each time clause is added

Retain clause in MLN if overall score improves

30

1+ 11+ 1

MLN Creation

Page 31: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Background Learning via Hypergraph Lifting Experiments Future Work

31

Outline

Page 32: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

IMDB Created from IMDB.com DB Movies, actors, etc., and relationships 17,793 ground atoms; 1224 true ones

UW-CSE Describes academic department Students, faculty, etc., and relationships 260,254 ground atoms; 2112 true ones

32

Datasets

Page 33: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Cora Citations to computer science papers Papers, authors, titles, etc., and their

relationships 687,422 ground atoms; 42,558 true ones

33

Datasets

Page 34: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Five-fold cross validation Inferred prob. true for groundings of each

predicate Groundings of all other predicates as evidence

Evaluation measures Area under precision-recall curve (AUC) Average conditional log-likelihood (CLL)

34

Methodology

Page 35: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

MCMC inference algms in Alchemy to evaluate the test atoms 1 million samples 24 hours

35

Methodology

Page 36: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Compared with MSL [Kok & Domingos, ICML’05] BUSL [Mihalkova & Mooney, ICML’07]

Lesion study NoLiftGraph: LHL with no hypergraph lifting

Find paths directly from unlifted hypergraph NoPathFinding: LHL with no pathfinding

Use MLN representing LiftGraph36

Methodology

Page 37: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

0.17

0.19

0.21

0.23

37

LHL vs. BUSL vs. MSLArea under Prec-Recall Curve

LHL BUSL MSL

IMDB UW-CSE

Cora

LHL BUSL MSL

LHL BUSL MSL

Page 38: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

-0.39

-0.35

-0.31

-0.27

-0.57

-0.47

-0.37

-0.27

-0.17

-0.07

-0.18

-0.17

-0.16

-0.15

-0.14

-0.13

-0.12

LHL vs. BUSL vs. MSLConditional Log-likelihood

IMDB UW-CSE

Cora

LHL BUSL MSL LHL BUSL MSL

LHL BUSL MSL

Page 39: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

0

4

8

12

16

39

LHL vs. BUSL vs. MSLRuntime

0

20

40

60

0

4

8

12

UW-CSEIMDB

Cora

min hr

hr

LHL BUSL MSL LHL BUSL MSL

LHL BUSL MSL

Page 40: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

40

LHL vs. NoLiftGraphArea under Prec-Recall Curve

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1IMDB UW-CSE

Cora

NoLiftGraph

LHLNoLiftGraphLHL

NoLiftGraph

LHL

Page 41: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

-0.39

-0.34

-0.29

-0.24

-0.19

-0.05

-0.04

-0.03

-0.135

-0.13

-0.125

-0.12

41

LHL vs. NoLiftGraphConditional Log-likelihood

IMDB UW-CSE

Cora

NoLiftGraphLHL

NoLiftGraphLHL

NoLiftGraphLHL

Page 42: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

42

0

50

100

150

200

250

0100020003000400050006000

0

50

100

150

LHL vs. NoLiftGraphRuntime

IMDB UW-CSE

Cora

min

hr

hr

NoLiftGraphLHL

NoLiftGraphLHL

NoLiftGraphLHL

Page 43: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

43

LHL vs. NoPathFinding

00.20.40.60.8

1

0

0.1

0.2

0.3

-0.28

-0.23

-0.18

-0.13

-0.0700000000000001

-0.0600000000000001

-0.0500000000000001

-0.0400000000000001

-0.03

AU

C

AU

C

CLL CLL

IMDB UW-CSE

NoPathFindingLHL

NoPathFindingLHL

NoPathFindingLHL NoPath

FindingLHL

Page 44: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

if a is an actor and d is a director,and they both worked in the same movie,then a probably worked under d

if p is a professor, and p co-authored a paper with s, then s is likely a student

if papers x and y have same authorthen x and y are likely to be same paper 44

Examples of Rules Learned

Page 45: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Motivation Background Learning via Hypergraph Lifting Experiments Future Work

45

Outline

Page 46: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

Integrate the components of LHL Integrate LHL with lifted inference [Singla &

Domingos, AAAI’08] Construct ontology simultaneously with

probabilistic KB Further scale LHL up Apply LHL to larger, richer domains

e.g., the Web

46

Future Work

Page 47: Learning Markov Logic Network Structure Via  Hypergraph  Lifting

LHL = Clustering + Relational Pathfinding“Lifts” data into more compact form

Essential for speeding up relational pathfindingLHL outperforms state-of-the-art structure

learners

47

Conclusion