![Page 1: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/1.jpg)
1
Learning the Structure of Markov Logic
Networks
Stanley Kok
![Page 2: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/2.jpg)
2
Overview Introduction CLAUDIEN, CRFs Algorithm
Evaluation Measure Clause Construction Search Strategies Speedup Techniques
Experiments
![Page 3: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/3.jpg)
3
Introduction Richardson & Domingoes (2004) learned MLN
structure in two disjoint steps: Learn FO clauses with off-the-shelf ILP system
(CLAUDIEN) Learn clause weights by optimizing pseudo-
likelihood
Develop algorithm: Learns FO clauses by directly optimizing pseudo-
likelihood Fast enough Learns better structure than R&D, pure ILP, purely
probabilistic and purely KB approaches
![Page 4: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/4.jpg)
4
CLAUDIEN CLAUsal DIscovery ENgine Starts with trivially false clause Repeatedly refine current clauses by adding literals Adds clauses that satisfy min accuracy and
coverage to KBtrue ) false
m ) false f ) false h ) false
m^f ) false m ) h
m ) fm^h ) false
f ) h f ) m f^h ) false h ) f h ) m
h ) m v f
![Page 5: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/5.jpg)
5
CLAUDIEN language bias ´ clause template
Refine handcrafted KB Example,
Professor(P) ( AdvisedBy(S,P) in KB dlab_template(‘1-2:[Professor(P),Student(S)]<-
AdvisedBy(S,P)’) Professor(P) v Student(S) ( AdvisedBy(S,P)
![Page 6: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/6.jpg)
6
Conditional Random Fields Markov networks used to compute P(y|x)
(McCallum2003)
Model:
Features, fk e.g. “current word is capitalized and next word is Inc”
y1 y2 y3 yn-1 yn
x1,x2,…,xn
…
IBM hired Alice….
Org PersonMisc Misc Misc
![Page 7: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/7.jpg)
7
CRF – Feature Induction Set of atomic features (word=the, capitalized etc) Starts from empty CRF While convergence criteria is not met
Create list of new features consisting of Atomic features Binary conjunctions of atomic features Conjunctions of atomic features with features already in
model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features)
![Page 8: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/8.jpg)
8
Algorithm High-level algorithm
RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN
Until Clauses =
FindBestClauses(MLN)Search for, For each candidate clause c
Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain
and create candidate clauses
![Page 9: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/9.jpg)
9
Evaluation Measure Ideally use log-likelihood, but slow
Recall: Value: Gradient:
![Page 10: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/10.jpg)
10
Evaluation Measure Use pseudo-log-likelihood
(R&D(2004)), but Undue weight to predicates with large #
of groundings Recall: E.g.:
![Page 11: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/11.jpg)
11
Evaluation Measure Use weighted pseudo-log-likelihood (WPLL)
E.g.:
![Page 12: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/12.jpg)
12
Algorithm High-level algorithm
RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN
Until Clauses =
FindBestClauses(MLN)Search for, For each candidate clause c
Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain
and create candidate clauses
![Page 13: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/13.jpg)
13
Clause Construction
Add a literal (negative/positive) All possible ways variables of new literal can
be shared with those of clause !Student(S) v AdvBy(S,P)
Remove a literal (when refining MLN) Remove spurious conditions from rules !Student(S) v !YrInPgm(S,5) v TA(S,C)
v TmpAdvBy(S,P)
![Page 14: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/14.jpg)
14
Clause Construction Flip signs of literals (when refining MLN)
Move literals on wrong side of implication !CseQtr(C1,Q1) v !CseQtr(C2,Q2) v !
SameCse(C1,C2) v !SameQtr(Q1,Q2) Beginning of algorithm Expensive, optional
Limit # of distinct variables to restrict search space
![Page 15: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/15.jpg)
15
Algorithm High-level algorithm
RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN
Until Clauses =
FindBestClauses(MLN)Search for, For each candidate clause c
Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain
and create candidate clauses
![Page 16: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/16.jpg)
16
Search Strategies Shortest-first search (SFS)
1. Find gain of each clause2. Sort clauses by gain3. Return top 5 with positive gainMLN
wt1, !AdvBy(S,P)wt2, clause2
…
4. Add 5 clauses to MLN5. Retrain wts of MLN
candidate set
1. Find gain of each clause2. Sort them by gain
(Yikes! All length-2 clauses have gains · 0)
!AdvBy(S,P) v Stu(S)
![Page 17: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/17.jpg)
17
Shortest-First Search
a. Extend 20 length-2 clause with highest gains
b. Form new candidate setc. Keep 1000 clauses with
highest gains
MLNwt1, !AdvBy(S,P)
wt2, clause2…
!AdvBy(S,P) v Stu(S) !AdvBy(S,P) v Stu(S) v Prof(P)
![Page 18: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/18.jpg)
18
Shortest-First Search Shortest-first search (SFS)
• Repeat process • Extend all length-2
clauses before length-3 ones
MLNwt1, clause1wt2, clause2
…
candidate setHow do you refine a non-empty MLN?
![Page 19: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/19.jpg)
19
SFS – MLN Refinementa. Extend 20 length-2
clause with highest gainsb. Extend length-2 clauses
in MLNc. Remove a predicate from
length-4 clauses in MLNd. Flip signs of length-3
clauses in MLN (optional)e. b,c,d replaces original
clause in MLN
MLNwt1, !AdvBy(S,P)
wt2, clause2…
wtA, clauseAwtB, clauseB
…
![Page 20: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/20.jpg)
20
Search Strategies Beam Search
1. Keep a beam of 5 clauses with highest gains 2. Track best clause3. Stop when best clause does not change after two
consecutive iterations
MLNwt1, clause1wt2, clause2
…wtA, clauseAwtB, clauseB
…
How do you refine a non-empty MLN?
![Page 21: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/21.jpg)
21
Algorithm High-level algorithm
RepeatClauses <- FindBestClauses(MLN)Add Clauses to MLN
Until Clauses =
FindBestClauses(MLN)Search for, For each candidate clause c
Compute gainevaluation measure of adding c to MLNReturn k clauses with highest gain
and create candidate clauses
![Page 22: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/22.jpg)
22
Difference from CRF – Feature Induction Set of atomic features (word=the, capitalized etc) Start from empty CRF While convergence criteria is not met
Create list of new features consisting of Atomic features Binary conjunctions of atomic features Conjunctions of atomic features with features already in
model Evaluate gain in P(y|x) of adding each feature to model Add best K features to model (100s-1000s features)
We can refine non-empty MLN
•We use pseudo-likelihood; different optimizations.•Applicable to arbitrary MN (not only linear chains)
•Maintain separate candidate set•Add best ¼10s in model
Flexible enough to fit in different search algms
![Page 23: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/23.jpg)
23
OverviewIntroductionCLAUDIEN, CRFsAlgorithm
Evaluation MeasureClause ConstructionSearch Strategies
Speedup Techniques Experiments
![Page 24: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/24.jpg)
24
Speedup Techniques Recall: FindBestClauses(MLN)
Search for, and create candidate clausesFor each candidate clause c
Compute gainWPLL of adding c to MLNReturn k clauses with highest gain
LearnWeights(MLN+c) to optimize WPLL with L-BFGS L-BFGS computes value and gradient of WPLL
Many candidate clauses; important to compute WPLL and its gradient efficiently
![Page 25: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/25.jpg)
25
Speedup Techniques WPLL:
Ignore clauses in which predicate does not appear in e.g. predicate l does not appear in clause 1
CLL
![Page 26: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/26.jpg)
26
Speedup Techniques Gnd pred’s CLL affected by clauses that contains it Most clause weights do not significantly
Most CLLs do not much Don’t have to recompute all CLLs
Store WPLL and CLLs Recompute CLLs only if weights affecting it beyond
some threshold Subtract old CLLs and add new CLLs to WPLL
![Page 27: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/27.jpg)
27
Speedup Techniques WPLL is a sum over all ground predicates
Estimate WPLL Uniformly sampling grounding of each FO predicates
Sample x% of # groundings subject to min, max Extrapolate the average
![Page 28: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/28.jpg)
28
Speedup Techniques WPLL and its gradient
Compute # true groundings of a clause #P-complete problem
Karp & Luby (1983)’s Monte-Carlo algorithm Gives estimate that is within of true value with
probability 1- Draws samples of a clause
Found that estimate converges faster than algorithm specifies Use convergence test (DeGroot & Schervish 2002) after
every 100 samples Earlier termination
![Page 29: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/29.jpg)
29
Speedup Techniques L-BFGS used to learn clause weights to
optimize WPLL Two parameters:
Max number of iterations Convergence Threshold
Use smaller # max iterations and looser convergence thresholds When evaluating candidate clause’s gain Faster termination
![Page 30: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/30.jpg)
30
Speedup Technique Lexicographic ordering on clauses
Avoid redundant computations for clauses that are syntactically the same
Don’t detect semantically identical but syntactically different clauses (NP-complete problem)
Cache new clauses Avoid recomputation
![Page 31: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/31.jpg)
31
Speedup Techniques Also used R&D04 techniques for WPLL gradient :
Ignore predicates that don’t appear in ith formula
Ignore ground formulas with truth value unaffected by changing truth value of any literal
# true groundings of a clause computed once and cached
![Page 32: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/32.jpg)
32
OverviewIntroductionCLAUDIEN, CRFsAlgorithm
Evaluation MeasureClause ConstructionSearch StrategiesSpeedup Techniques
Experiments
![Page 33: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/33.jpg)
33
Experiments UW-CSE domain
22 predicates e.g. AdvisedBy, Professor etc 10 types e.g. Person, Course, Quarter etc Total # ground predicates about 4 million # true ground predicates (in DB) = 3212 Handcrafted KB with 94 formulas
Each student has at most one advisor If a student is an author of a paper, so is her advisor
etc
![Page 34: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/34.jpg)
34
Experiments Cora domain
1295 citations to 112 CS research papers Author, Venue, Title, Year fields 5 Predicates viz. SameCitation, SameAuthor,
SameVenue, SameTitle, SameYear Evidence Predicates e.g.
WordsInCommonInTitle20%(title1, title2) Total # ground predicates about 5 million # true ground predicates (in DB) = 378,589 Handcrafted KB with 26 clauses
If two citations same, then they have same authors, titles etc, and vice versa
If two titles have many words in common, then they are the same, etc
![Page 35: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/35.jpg)
35
Systems MLN(KB): weight-learning applied to handcrafted
KB MLN(CL): structure-learning with CLAUDIEN;
weight-learning MLN(KB+CL): structure-learning with CLAUDIEN,
using the handcrafted KB as its language bias; weight-learning
MLN(SLB): structure-learning with beam search, start from empty MLN
MLN(KB+SLB): ditto, start from handcrafted KB MLN(SLB+KB): structure-learning with beam
search, start from empty MLN, allow handcrafted clauses to be added in a first search step
MLN(SLS): structure-learning with SFS, start from empty MLN
![Page 36: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/36.jpg)
36
Systems CL: CLAUDIEN alone KB: handcrafted KB alone KB+CL: CLAUDIEN with KB as its language
bias NB: naïve bayes BN: Bayesian networks
![Page 37: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/37.jpg)
37
Methodology UW-CSE domain
DB divided into 5 areas: ai, graphics, languages, systems, theory
Leave-one-out testing by area Cora domain
5 different train-test splits Measured
average CLL of the predicates average area under the precision-recall curve
of the predicates (AUC)
![Page 38: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/38.jpg)
38
Results MLN(SLS), MLN(SLB) better than
MLN(CL), MLN(KB), CL, KB, NB, BN
UW-CSE
0.533
0.4710.430
0.5500.507
0.306
0.419
0.320
0.170
0.286
0.3890.395
0.000
0.100
0.200
0.300
0.400
0.500
0.600
AU
UW-CSE
0.0590.0860.142
0.0680.114
0.418
0.141
1.100
0.733
1.234
0.507
0.166
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
CLL
CLL
(-v
e)
AU
C
![Page 39: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/39.jpg)
39
Results MLN(SLS), MLN(SLB) better than
MLN(CL), MLN(KB), CL, KB, NB, BN
Cora
0.7820.826
0.782 0.796
0.148
0.813
0.693
0.148
0.693
0.1040.061
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
AU
CLL
AU
C
Cora
0.058 0.058 0.058 0.071
0.693
0.067
0.224
0.693
0.225
0.440
0.266
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
CLL
CLL
(-v
e)
![Page 40: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/40.jpg)
40
Results MLN(SLB+KB) better than
MLN(KB+CL), KB+CL
UW-CSE
0.533
0.4710.430
0.5500.507
0.306
0.419
0.320
0.170
0.286
0.3890.395
0.000
0.100
0.200
0.300
0.400
0.500
0.600
AU
UW-CSE
0.0590.0860.142
0.0680.114
0.418
0.141
1.100
0.733
1.234
0.507
0.166
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
CLL
CLL
(-v
e)
AU
C
![Page 41: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/41.jpg)
41
Results MLN(SLB+KB) better than
MLN(KB+CL), KB+CL
Cora
0.7820.826
0.782 0.796
0.148
0.813
0.693
0.148
0.693
0.1040.061
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
AU
CLL
AU
C
Cora
0.058 0.058 0.058 0.071
0.693
0.067
0.224
0.693
0.225
0.440
0.266
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
CLL
CLL
(-v
e)
![Page 42: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/42.jpg)
42
Results MLN(<system>) does better than corresponding
<system>
UW-CSE
0.533
0.4710.430
0.5500.507
0.306
0.419
0.320
0.170
0.286
0.3890.395
0.000
0.100
0.200
0.300
0.400
0.500
0.600
AU
UW-CSE
0.0590.0860.142
0.0680.114
0.418
0.141
1.100
0.733
1.234
0.507
0.166
0.000
0.200
0.400
0.600
0.800
1.000
1.200
1.400
CLL
CLL
(-v
e)
AU
C
![Page 43: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/43.jpg)
43
Results MLN(<system>) does better than corresponding
<system>
Cora
0.7820.826
0.782 0.796
0.148
0.813
0.693
0.148
0.693
0.1040.061
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
AU
CLL
AU
C
Cora
0.058 0.058 0.058 0.071
0.693
0.067
0.224
0.693
0.225
0.440
0.266
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
CLL
CLL
(-v
e)
![Page 44: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/44.jpg)
44
Results MLN(SLS) on UW-CSE; cluster of 15 dual-
CPUs 2.8 GHz Pentium 4 machines With speed-ups: 5.3 hrs Without speed-ups: didn’t finish running in 24
hrs MLN(SLB) on UW-CSE; on single 2.8 GHz
Pentium 4 machine With speedups: 8.8 hrs Without speedups: 13.7 hrs
![Page 45: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/45.jpg)
45
Future Work Speeding up counting of # true
groundings of clause Probabilistically bounding the loss in
accuracy due to subsampling Probabilistic predicate discovery
![Page 46: Learning the Structure of Markov Logic Networks](https://reader035.vdocuments.site/reader035/viewer/2022062810/56815c0c550346895dc9ece2/html5/thumbnails/46.jpg)
46
Conclusion Develop algorithm:
Learns FO clauses by directly optimizing pseudo-likelihood
Fast enough Learns better structure than R&D, pure ILP,
purely probabilistic and purely KB approaches