1 learning the structure of markov logic networks stanley kok & pedro domingos dept. of computer...

Learning the Structure of Markov Logic

Networks

Stanley Kok & Pedro Domingos

Dept. of Computer Science and Eng.

University of Washington

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Motivation Statistical Relational Learning (SRL)

combines the benefits of: Statistical Learning: uses probability to handle

uncertainty in a robust and principled way Relational Learning: models domains with

multiple relations

Motivation Many SRL approaches combine a logical

language and Bayesian networks e.g. Probabilistic Relational Models

[Friedman et al., 1999]

The need to avoid cycles in Bayesian networks causes many difficulties [Taskar et al., 2002]

Started using Markov networks instead

Motivation Relational Markov Networks [Taskar et al., 2002]

conjunctive database queries + Markov networks Require space exponential in the size of the cliques

Markov Logic Networks [Richardson & Domingos, 2004]

First-order logic + Markov networks Compactly represent large cliques Did not learn structure (used external ILP system)

Motivation Relational Markov Networks [Taskar et al., 2002]

conjunctive database queries + Markov networks Require space exponential in the size of the cliques

Markov Logic Networks [Richardson & Domingos, 2004]

First-order logic + Markov networks Compactly represent large cliques Did not learn structure (used external ILP system)

This paper develops a fast algorithm that learns MLN structure Most powerful SRL learner to date

Markov Logic Networks

First-order KB: set of hard constraints Violate one formula, a world has zero probability

MLNs soften constraints OK to violate formulas The fewer formulas a world violates,

the more probable it is Gives each formula a weight,

reflects how strong a constraint it is

MLN Definition A Markov Logic Network (MLN) is a set of

pairs (F, w) where F is a formula in first-order logic w is a real number

Together with a finite set of constants,it defines a Markov network with One node for each grounding of each predicate

in the MLN One feature for each grounding of each formula F

in the MLN, with the corresponding weight w

Ground Markov Network

Student(STAN)

Professor(PEDRO)

AdvisedBy(STAN,PEDRO)

Professor(STAN)

Student(PEDRO)

AdvisedBy(PEDRO,STAN)

AdvisedBy(STAN,STAN)

AdvisedBy(PEDRO,PEDRO)

AdvisedBy(S,P) ) Student(S) ^ Professor(P)2.7

constants: STAN, PEDRO

MLN Model

Vector of value assignments to ground predicates

MLN Model

Partition function. Sums over all possiblevalue assignments to ground predicates

MLN Model

Weight of ith formula

MLN Model

Weight of ith formula

# of true groundings of ith formula

MLN Weight Learning

Likelihood is concave function of weights Quasi-Newton methods to find optimal weights

e.g. L-BFGS [Liu & Nocedal, 1989]

MLN Weight Learning

SLOW#P-complete

MLN Weight Learning

SLOW#P-completeSLOW

#P-complete

MLN Weight Learning R&D used pseudo-likelihood [Besag, 1975]

MLN Structure Learning

R&D “learned” MLN structure in two disjoint steps: Learn first-order clauses with an off-the-shelf

ILP system (CLAUDIEN [De Raedt & Dehaspe, 1997]) Learn clause weights by optimizing

pseudo-likelihood Unlikely to give best results because CLAUDIEN

find clauses that hold with some accuracy/frequency in the data

don’t find clauses that maximize data’s (pseudo-)likelihood

This paper develops an algorithm that: Learns first-order clauses by directly optimizing

pseudo-likelihood Is fast enough Performs better than R&D, pure ILP,

purely KB and purely probabilistic approaches

MLN Structure Learning

Structure Learning Algorithm

High-level algorithmREPEAT

MLN Ã MLN [ FindBestClauses(MLN)UNTIL FindBestClauses(MLN) returns NULL

FindBestClauses(MLN)Create candidate clausesFOR EACH candidate clause c

Compute increase in evaluation measureof adding c to MLN

RETURN k clauses with greatest increase

Structure Learning Evaluation measure Clause construction operators Search strategies Speedup techniques

Evaluation Measure

R&D used pseudo-log-likelihood

This gives undue weight to predicates with large # of groundings

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

weight given to predicate r

Evaluation Measure

sums over groundings of predicate r

weight given to predicate r

Evaluation Measure

sums over groundings of predicate r

weight given to predicate r CLL: conditional log-likelihood

Clause Construction Operators Add a literal (negative/positive) Remove a literal Flip signs of literals Limit # of distinct variables to restrict

search space

Beam Search

Same as that used in ILP & rule induction Repeatedly find the single best clause

Shortest-First Search (SFS)

1. Start from empty or hand-coded MLN2. FOR L Ã 1 TO MAX_LENGTH3. Apply each literal addition & deletion to

each clause to create clauses of length L4. Repeatedly add K best clauses of length L

to the MLN until no clause of length L improves WPLL

Similar to Della Pietra et al. (1997), McCallum (2003)

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS) of adding c to MLN

Compute increase in WPLL (using L-BFGS)of adding c to MLN

SLOWMany candidates

SLOWMany CLLs

SLOWEach CLL involves a#P-complete problem

SLOWMany candidates

NOT THAT FAST

SLOWMany CLLs

SLOWEach CLL involves a#P-complete problem

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Experiments UW-CSE domain

22 predicates, e.g., AdvisedBy(X,Y), Student(X), etc. 10 types, e.g., Person, Course, Quarter, etc. # ground predicates ¼ 4 million # true ground predicates ¼ 3000 Handcrafted KB with 94 formulas

Each student has at most one advisor If a student is an author of a paper, so is her advisor

Cora domain Computer science research papers Collective deduplication of author, venue, title

Systems

MLN(SLB): structure learning with beam searchMLN(SLS): structure learning with SFS

Systems

MLN(SLB) MLN(SLS)

KB: hand-coded KBCL: CLAUDIENFO: FOILAL: Aleph

Systems

MLN(SLB) MLN(SLS)

KBCLFOAL

MLN(KB)MLN(CL)MLN(FO)MLN(AL)

Systems

MLN(SLB) MLN(SLS)

NB: Naïve Bayes

BN: Bayesian

networks

KBCLFOAL

MLN(KB)MLN(CL)MLN(FO)MLN(AL)

Methodology UW-CSE domain

DB divided into 5 areas: AI, Graphics, Languages, Systems, Theory

Leave-one-out testing by area Measured

average CLL of the ground predicates average area under the precision-recall curve

of the ground predicates (AUC)

0.5330.472

0.140 0.148

0.1700.131 0.117

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

UW-CSE

0.5330.472

0.140 0.148

0.1700.131 0.117

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

UW-CSE

0.5330.472

0.140 0.148

0.1700.131 0.117

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

UW-CSE

0.5330.472

0.140 0.148

0.1700.131 0.117

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

UW-CSE

0.390 0.397

UW-CSE-0.061 -0.088

-0.370

-0.166

UW-CSE

Timing MLN(SLS) on UW-CSE

Cluster of 15 dual-CPUs 2.8 GHz Pentium 4 machines

Without speedups: did not finish in 24 hrs With speedups: 5.3 hrs

8.46.5

Lesion Study Disable one speedup technique at a time; SFS

UW-CSE (one-fold)

all speedups

no clausesampling

no predicatesampling

don’t avoidredundancy

no looseconverg.

threshold

no weight thresholding

Future Work Speed up counting of # true

groundings of clause Probabilistically bound the loss in

accuracy due to subsampling Probabilistic predicate discovery

Conclusion Markov logic networks: a powerful combination

of first-order logic and probability Richardson & Domingos (2004) did not learn

MLN structure We develop an algorithm that automatically learns

both first-order clauses and their weights We develop speedup techniques to make our

algorithm fast enough to be practical We show experimentally that our algorithm

outperforms Richardson & Domingos Pure ILP Purely KB approaches Purely probabilistic approaches

(For software, email: koks@cs.washington.edu)

1 learning the structure of markov logic networks stanley kok & pedro domingos dept. of computer...

mln model slide

pedro slide

order logic markov networks

markov logic network

th formula slide

mln structure

date slide

bayesian networks

Documents

scientific abstract markov, p.u. - markov, v.a. · title:...

kungsater kok personliga kok

1 markov logic stanley kok dept. of computer science & eng....

1 learning markov logic network structure via hypergraph...

hugo kok - portfolio

kok kömürü

wordpress.com · 2009. 5. 5. · skp peruss/sit. sdp vihr....

dagvoorzitter david kok

ang kiu kok

markov chains regular markov chains absorbing markov chains

markov logic: a simple and powerful unification of logic and...

review markov logic networks mathew richardson pedro...

1 learning the structure of markov logic networks stanley...

discriminative training of markov logic networks parag...

kok wooncenter magazine

burford albums what's new fall 2011 - captain nick's...

cpsc 422, lecture 34slide 1 intelligent systems (ai-2)...

markov logic: a unifying language for information and...

voorjaar 2012 kok

markov logic networks: a step towards a unified theory of...