an introduction to statistical relational learningluc.deraedt/salvador.pdf · an introduction to...
TRANSCRIPT
![Page 1: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/1.jpg)
An Introduction toStatistical Relational Learning
Luc De RaedtKatholieke Universiteit Leuven
joint work with Kristian Kersting
![Page 2: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/2.jpg)
Probabilistic Logic Learning*One of the key open questions of artificial intelligence
concerns
"probabilistic logic learning", i.e. the integration of probabilistic reasoning
with
machine learning.
logical or relational representations and
*In the US, sometimes called Statistical Relational Learning
![Page 3: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/3.jpg)
Probabilistic Logic Learning*
The knowledge representation framework par excellence
![Page 4: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/4.jpg)
Logic: Syntaxclass(Book,Topic) :-
author(Book,Author), favorite-topic(Author,Topic).
the class of Book is Topic IF
the Book is written by Author AND
Topic is the favorite of the Author
favorite-topic(rowlings,fantasy)
the favorite topic of rowlings is fantasy
Variableconstant
predicate/relation
Fact(s)
Clause(s)
![Page 5: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/5.jpg)
Logic: three viewsentailment/logical consequence
H |= class(harrypotter,fantasy)
class(harrypotter,fantasy) follows from H
interpretation {class(hp,fan), author(row,hp), fav(row,fan)}
is a model for H
is a possible world given H
proofs class(hp,fan)
author(row,hp) fav(row,fan)
![Page 6: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/6.jpg)
Probabilistic Logic Learning*
Reasoning about uncertainty
Graphical models, such as Bayesian and Markov
Networks
Probabilistic grammars and databases.
![Page 7: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/7.jpg)
Bayesian Networks
0.9 0.1e
b
e0.2 0.8
0.01 0.990.9 0.1
bebb
e
BE P(A | B,E)Earthquake
JohnCalls
Alarm
MaryCalls
Burglary
P(E,B,A,J,M) = P(E).P(B).P(A|E). P(A|B).P(J|A).P(M|A)
INTERPRETATIONSTATE/DESCRIPTION
{A, ¬E,¬B, J, M}
Structure &Parameters
![Page 8: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/8.jpg)
Probabilistic Logic Learning*
Learning the model from data
Structure &Parameters
Logical &Statistical
![Page 9: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/9.jpg)
Some PLL formalisms
LPAD: Bruynooghe
Vennekens,VerbaetenMarkov Logic: Domingos,
RichardsonCLP(BN): Cussens,Page,
Qazi,Santos Costa
Present
PRMs: Friedman,Getoor,Koller,Pfeffer,Segal,Taskar
´03
SLPs: Cussens,Muggleton
´90 ´95 96
First KBMC approaches:
Breese,
Bacchus,Charniak,
Glesner,Goldman,
Koller,Poole, Wellmann
´00
BLPs: Kersting, De Raedt
RMMs: Anderson,Domingos,Weld
LOHMMs: De Raedt, Kersting,Raiko
[names in alphabetical order]
Future
Prob. CLP: Eisele, Riezler
´02
PRISM: Kameya, Sato
´94
PLP: Haddawy, Ngo
´97´93
Prob. Horn Abduction: Poole
´99
1BC(2): Flach,Lachiche
Logical Bayesian Networks: Blockeel,Bruynooghe,
Fierens,Ramon,
![Page 10: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/10.jpg)
What I am not going to do
• Introduce yet another “lingua franca” or probabilistic logic
• analogy with programming languages & knowledge representation
• Talk a lot about logic
![Page 11: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/11.jpg)
Rather
• I will start from logic + learning and extend it with probabilistic reasoning
• I will argue that traditional logic + learning settings apply to probabilistic logic learning as well
• I will argue that this is probably useful for applications and that learning is essential
![Page 12: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/12.jpg)
Books
Also, survey paper De Raedt, Kersting, SIGKDD Explorations 03
![Page 13: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/13.jpg)
Overview
A. Motivation for Probabilistic Logic Learning
B. Logic + Learning
C. Adding probability
![Page 14: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/14.jpg)
A. Motivation
![Page 15: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/15.jpg)
A. MotivationA.1 Some cases
![Page 16: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/16.jpg)
Case 1: Structure Activity Relationship Prediction
O CH=N-NH-C-NH 2O=N
O - O
nitrofurazone
N O
O
+
-
4-nitropenta[cd]pyrene
N
O O-
6-nitro-7,8,9,10-tetrahydrobenzo[a]pyrene
NH
NO O-
+
4-nitroindole
Y=Z
Active
Inactive
Structural alert:
[Srinivasan et al. AIJ 96]
Data = Set of Small Graphs
General PurposeLogic Learning System
Uses and ProducesKnowledge
![Page 17: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/17.jpg)
Using and Producing Knowledge
Logical and relational learning systems can use and produce knowledge (ILP)
Result of learning task is understandable and interpretable
Logical and relational learning algorithms can use background knowledge, e.g. ring structures
![Page 18: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/18.jpg)
gene
protein
pathway
cellularcomponent
homologgroup
phenotype
biologicalprocess
locus
molecularfunction has
is homologous to
participates in
participates inis located in
is related to
refers tobelongs to
is found in
codes for
subsumes,interacts with
is found in
participates in
refers to
Case 2: Biological NetworksBiomine Database @ Helsinki
Data = Large (Probabilistic) Network
![Page 19: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/19.jpg)
Network around Alzheimer Disease
![Page 20: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/20.jpg)
presenilin 2Gene
EntrezGene:81751
Notch receptor processingBiologicalProcessGO:GO:0007220
![Page 21: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/21.jpg)
-participates_in0.220
BiologicalProcess
Gene
![Page 22: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/22.jpg)
Questions to askHow to support the life scientist in using and discovering new knowledge in the network ?
• Is gene X involved in disease Y ?
• Should there be a link between gene X and disease Y ? If so, what type of link ?
• What is the probability that gene X is connected to disease Y ?
• Which genes are similar to X w.r.t. disease Y?
• Which part of the network provides the most information (network extraction) ?
• ...
See Invited Talk on Tuesday
![Page 23: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/23.jpg)
Case 3: Evolving Networks
• Travian: A massively multiplayer real-time strategy game
• Commercial game run by TravianGames GmbH
• ~3.000.000 players spread over different “worlds”
• ~25.000 players in one world[Thon et al. ECML 08]
![Page 24: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/24.jpg)
World Dynamics
border
border
border
border
Alliance 2
Alliance 3
Alliance 4
Alliance 6
P 2
1081
8951090
1090
1093
1084
1090
915
1081
1040
770
1077
955
1073
8041054
830
9421087
786
621
P 3
744
748559
P 5
861
P 6
950
644
985
932
837871
777
P 7
946
878
864 913
P 9
Fragment of world with
~10 alliances~200 players~600 cities
alliances color-coded
Can we build a modelof this world ?
Can we use it for playingbetter ?
[Thon, Landwehr, De Raedt, ECML08]
![Page 25: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/25.jpg)
World Dynamics
border
border
border
border
Alliance 2
Alliance 4
Alliance 6
P 2
9041090
917
770
959
1073
820
762
9461087
794
632
P 3
761
961
1061
607
988
771
924
583
P 5
951
935
948
938
867
P 6
950
644
985
888
844875
783
P 7
946
878
864 913
Fragment of world with
~10 alliances~200 players~600 cities
alliances color-coded
Can we build a modelof this world ?
Can we use it for playingbetter ?
[Thon, Landwehr, De Raedt, ECML08]
![Page 26: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/26.jpg)
World Dynamics
border
border
border
border
Alliance 2
Alliance 4
Alliance 6
P 2
9181090
931
779
977
835
781
9581087
808
701
P 3
838
947
1026
1081
833
1002987
827
994
663
P 5
1032
1026
1024
1049
905
926
P 6
986
712
985
920
877
807
P 7
895
959
P 10
824
Fragment of world with
~10 alliances~200 players~600 cities
alliances color-coded
Can we build a modelof this world ?
Can we use it for playingbetter ?
[Thon, Landwehr, De Raedt, ECML08]
![Page 27: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/27.jpg)
World Dynamics
border
border
border
border
Alliance 2
Alliance 4
Alliance 6
P 2
9231090
941
784
983
844
786
9661087
815
711
P 3
864
986
842
1032
1083
868
712
10021000
858
996
696
P 5
1039
1037
1030
1053
826
933
P 6
985
807
P 7
894
963
P 10
829
781828
Fragment of world with
~10 alliances~200 players~600 cities
alliances color-coded
Can we build a modelof this world ?
Can we use it for playingbetter ?
[Thon, Landwehr, De Raedt, ECML08]
![Page 28: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/28.jpg)
World Dynamics
border
border
border
border
Alliance 2Alliance 4
Alliance 6
P 2
9381090
949
785
987
849
789
9761087
821
724
P 3
888
863
868
1040
1083
896
667
1005994
883
1002
742
P 5
1046
1046
1040
985
894
1058
879
938
921
807
P 6P 7
P 10
830
782829
Fragment of world with
~10 alliances~200 players~600 cities
alliances color-coded
Can we build a modelof this world ?
Can we use it for playingbetter ?
[Thon, Landwehr, De Raedt, ECML08]
![Page 29: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/29.jpg)
World Dynamics
border
border
border
border
Alliance 2
Alliance 4
P 2
948
951
786
990
856
795
980
828
730
P 3898
803
860
964
1037
1085
925
689
10051007
899
1005
760
P 5
1051
1051
1040
860
774
1061
886
944
844
945
713
P 10
839
796838
Fragment of world with
~10 alliances~200 players~600 cities
alliances color-coded
Can we build a modelof this world ?
Can we use it for playingbetter ?
[Thon, Landwehr, De Raedt, ECML08]
![Page 30: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/30.jpg)
Emerging Data Sets
In many application areas :
• vision, surveillance, activity recognition, robotics, ...
• data in relational format are becoming available
• use of knowledge and reasoning is essential
• in Travian -- ako STRIPS representation
![Page 31: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/31.jpg)
GerHome Example
Action and Activity Learning
(courtesy of Francois Bremond, INRIA-Sophia-Antipolis)
http://www-sop.inria.fr/orion/personnel/Francois.Bremond/topicsText/gerhomeProject.html
![Page 32: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/32.jpg)
“Relational” Robotics
Robotics
• several lower level problems have been (more or less) solved
• time to bridge the gap between lower and higher level representations ?
[Kersting et al. Adv. Robotics 07]
![Page 33: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/33.jpg)
Motivation for PLL
• Representational limitations of (standard)
• probabilistic representations & logic
• We: computational logic
• An example
• graphs & networks
• Essential : objects & relationships !! In many applications.
![Page 34: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/34.jpg)
A. MotivationA.2 Representational Issues
![Page 35: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/35.jpg)
Represent the dataHierarchy
att
att
att
att att example
exampleexampleexampleexampleexampleexampleexampleexample
single-table single-tuple
attribute-value
att
att
att
att
att exampl
eexampl
e
example
single-table multiple-tuplemulti-instance
att
att
att
att
att
att
att
example
example
example
example
example
example
example
exampexa
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
multi-table multiple-tuple
relational
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
2 relationsedge / vertex
graphs & networks
![Page 36: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/36.jpg)
Attribute-Valueatt
att
att
att att example
exampleexampleexampleexampleexampleexampleexampleexample
single-table single-tuple
attribute-value
Traditional Setting in Machine Learning(cf. standard tools like Weka)
![Page 37: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/37.jpg)
Multi-Instanceatt
att
att
att
att exampl
eexampl
e
example
single-table multiple-tuplemulti-instance
[Dietterich et al. AIJ 96]
An example is positive if there exists a tuple in the example that satisfies particular properties
Boundary case between relational and propositional learning.
A lot of interest in past 10 years
Applications: vision, chemo-informatics, ...
![Page 38: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/38.jpg)
Encoding Graphs
![Page 39: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/39.jpg)
Encoding Graphs
atom(1,cl).atom(2,c).atom(3,c).atom(4,c).atom(5,c).atom(6,c).atom(7,c).atom(8,o)....
bond(3,4,s).bond(1,2,s).bond(2,3,d)....
12
34
76
5
9
8
14 10
1312
11
1615
17
![Page 40: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/40.jpg)
Encoding Graphs
12
34
76
5
9
8
14 10
1312
11
1615
17
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
2 relationsedge / vertex
graphs & networks
![Page 41: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/41.jpg)
Encoding Graphsatom(1,cl,21,0.297)
atom(2,c,21, 0187)
atom(3,c,21,-0.143)
atom(4,c,21,-0.143)
atom(5,c,21,-0.143)
atom(6,c,21,-0.143)
atom(7,c,21,-0.143)
atom(8,o,52,0.98)
...
bond(3,4,s).
bond(1,2,s).
bond(2,3,d).
...
12
34
76
5
9
8
14 10
1312
11
1615
17
Note: add identifier for molecule
![Page 42: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/42.jpg)
Encoding KnowledgeUse background knowledge in form of rules
• encode hierarchies
halogen(A):- atom(X,f)
halogen(A):- atom(X,cl)
halogen(A):- atom(X,br)
halogen(A):- atom(X,i)
halogen(A):- atom(X,as)
•encode functional group
benzene-ring :- ...
intentional versus extentional encodings
![Page 43: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/43.jpg)
Relational Representation
12
34
76
5
9
8
14 10
1312
11
1615
17
att
att
att
att
att
att
att
example
example
example
example
example
example
example
exampexa
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
multi-table multiple-tuple
relational
![Page 44: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/44.jpg)
Relational versus Graphs
Advantages Relational
• background knowledge in the form of rules, ontologies, features, ...
• relations of arity > 2 (but hypergraphs)
• graphs capture structure but annotations with many features/labels is non-trivial
Advantages Graphs
• efficiency and scalability
• full relational is more complex
• matrix operations
![Page 45: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/45.jpg)
The Hierarchy att
att
att
att att example
exampleexampleexampleexampleexampleexampleexampleexample
single-table single-tuple
attribute-value
att
att
att
att
att exampl
eexampl
e
example
single-table multiple-tuplemulti-instance
att
att
att
att
att
att
att
example
example
example
example
example
example
example
exampexa
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
multi-table multiple-tuple
relational
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
2 relationsedge / vertex
graphs & networks
![Page 46: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/46.jpg)
Two questions
UPGRADING : Can we develop systems that work with richer representations (starting from systems for simpler representations)?
PROPOSITIONALISATION: Can we change the representation from richer representations to simpler ones ? (So we can use systems working with simpler representations)
![Page 47: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/47.jpg)
B. Logic + Learning
![Page 48: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/48.jpg)
B. Logic + LearningB.1. Problem Settings
![Page 49: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/49.jpg)
Traditional predictive ILP Problem
• Given
•a set of positive and negative examples (Pos, Neg)
•a background theory B
•a logic language Le to represent examples
•a logic language Lh to represent hypotheses
•a covers relation on Le x Lh
• Find
•A hypothesis h over Lh that covers all positive Pos and no negative Neg examples taking B into account
Concept-learning in a
logical/relational representation
![Page 50: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/50.jpg)
Three possible settings• Learning from entailment
• covers(H,e) iff H |= e
• Learning from interpretations
• covers(H,e) iff e is a model for H
• Learning from proofs or traces.
• covers(H,e) iff e is proof given H
![Page 51: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/51.jpg)
Three possible settings
• Learning from entailment
• covers(H,e) iff H |= e
• examples are facts / clauses
![Page 52: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/52.jpg)
The Mutagenicity dataset
Example : muta(225)
hypothesis
Background theory
Applications System
s:
e.g. FOIL, PROGOL, TILDE, ALEPH, G
OLEM, ...
B !H |= e
![Page 53: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/53.jpg)
Three possible settings
• Learning from interpretations
• covers(H,e) iff e is a model for H
• examples are Herbrand interpretations / state descriptions / possible worlds
![Page 54: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/54.jpg)
Learning from interpretations
• Examples
• Positive: { human(luc), human(lieve), male(luc), female(lieve)}
• Negative: { bat(dracula), male(dracula), vampire(dracula)}
Hypothesis
•human(X) :- male(X)
![Page 55: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/55.jpg)
Learning from interpretations
Examples
• Positive: { human(luc), human(lieve), male(luc), female(lieve)}
Hypothesis (positives only)
(maximally specific that covers example)
• human(X) :- female(X)
• human(X) :- male(X)
• false :- male(X), female(X)
ApplicationsFinding integrity constraints /
frequent patterns in relational databases
System
s:
e.g. LO
GAN-H, C
laudie
n, ICL, W
armr, ..
.
![Page 56: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/56.jpg)
Three possible settings
• Learning from proofs or traces.
• covers(H,e) iff e is proof given H
![Page 57: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/57.jpg)
An example
Applications– Tree bank grammar learning
– Program synthesis• Shapiro’s MIS poses queries in order
to reconstruct trace or proof
System
s:
e.g. Tr
eeba
nk gr
ammars
, Tracy
, MIS,
...
![Page 58: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/58.jpg)
Use of different Settings
Learning from interpretations– Typically used for description
– E.g., inducing integrity constraints
Learning from entailment– The most popular setting
– Typically used for prediction– E.g., predicting activity of compounds
Learning from traces/proofs– Typically used for hard problems, when
other settings seem to fail or fail to scale up– E.g., program synthesis from examples,
grammar induction, multiple predicate learning
-
+
Info
rmat
ion Different settings
provide different levels of information about
target program (cf. De Raedt, AIJ 97)
![Page 59: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/59.jpg)
B. Logic + LearningB.2. Systems & Methodology
![Page 60: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/60.jpg)
Representational Hierarchy -- Systems
att
att
att
att att example
exampleexampleexampleexampleexampleexampleexampleexample
single-table single-tuple
attribute-value
att
att
att
att
att exampl
eexampl
e
example
single-table multiple-tuplemulti-instance
att
att
att
att
att
att
att
example
example
example
example
example
example
example
exampexa
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
multi-table multiple-tuple
relational
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
2 relationsedge / vertex
graphs & networks
UPGRADING
![Page 61: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/61.jpg)
Two messages
Logical and Relational Learning applies essentially to any machine learning and data mining task, not just concept-learning
• distance based learning, clustering, descriptive learning, reinforcement learning, bayesian approaches
there is a recipe that is being used to derive new LRL algorithms on the basis of propositional ones
• not the only way to LRL
![Page 62: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/62.jpg)
Learning Tasks
• rule-learning & decision trees [Quinlan 90], [Blockeel 96]
• frequent and local pattern mining [Dehaspe 98]
• distance-based learning (clustering & instance-based learning) [Horvath, 01], [Ramon 00]
• probabilistic modeling (cf. statistical relational learning)
• reinforcement learning [Dzeroski et al. 01]
• kernel and support vector methods
Logical and relational representations can (and have been) used for all learning tasks and techniques
![Page 63: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/63.jpg)
The RECIPE
Start from well-known propositional learning system
Modify representation and operators
• e.g. generalization/specialization operator, similarity measure, …
• often use theta-subsumption as framework for generality
Build new system, retain as much as possible from propositional one
![Page 64: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/64.jpg)
LRL Systems and techniques
FOIL ~ CN2 – Rule Learning (Quinlan MLJ 90)
Tilde ~ C4.5 – Decision Tree Learning (Blockeel & DR AIJ 98)
Warmr ~ Apriori – Association rule learning (Dehaspe 98)
Progol ~~ AQ – Rule learning (Muggleton NGC 95)
Graph miners ...
![Page 65: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/65.jpg)
A case : FOILLearning from entailment -- the setting
Given
mutagenic(225), ...
Find
background
examples
rules
B ∪ H ⊧ e
B ∪ H ⊧ e
![Page 66: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/66.jpg)
Searching for a rule
:- true
Coverage = 0.65
Coverage = 0.7
Coverage = 0.6
:- atom(X,A,c)
:- atom(X,A,n)
:- atom(X,A,f)
Coverage = 0.6
Coverage = 0.75
:- atom(X,A,n),bond(A,B)
:- atom(X,A,n),charge(A,0.82)
Greedy separate-and-conquer for rule setGreedy general-to-specific search for single rule
![Page 67: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/67.jpg)
Searching for a rule
:- true
Coverage = 0.65
Coverage = 0.7
Coverage = 0.6
:- atom(X,A,c)
:- atom(X,A,n)
:- atom(X,A,f)
Coverage = 0.6
Coverage = 0.75
:- atom(X,A,n),bond(A,B)
:- atom(X,A,n),charge(A,0.82)
mutagenic(X) :- atom(X,A,n),charge(A,0.82)
Greedy separate-and-conquer for rule setGreedy general-to-specific search for single rule
![Page 68: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/68.jpg)
FOIL
mutagenic(X) :- atom(X,A,n),charge(A,0.82)
:- true
Coverage = 0.7
Coverage = 0.6
Coverage = 0.6
:- atom(X,A,c)
:- atom(X,A,n)
:- atom(X,A,f)
Coverage = 0.8
Coverage = 0.6
:- atom(X,A,c),bond(A,B)
:- atom(X,A,c),charge(A,0.82)
![Page 69: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/69.jpg)
FOIL
mutagenic(X) :- atom(X,A,n),charge(A,0.82)
:- true
Coverage = 0.7
Coverage = 0.6
Coverage = 0.6
:- atom(X,A,c)
:- atom(X,A,n)
:- atom(X,A,f)
Coverage = 0.8
Coverage = 0.6
:- atom(X,A,c),bond(A,B)
:- atom(X,A,c),charge(A,0.82)
mutagenic(X) :- atom(X,A,c),bond(A,B)
![Page 70: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/70.jpg)
mutagenic(X) :- atom(X,A,n),charge(A,0.82)
mutagenic(X) :- atom(X,A,c),bond(A,B)
mutagenic(X) :- atom(X,A,c),charge(A,0.45)
![Page 71: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/71.jpg)
TildeLogical Decision Trees (Blockeel & De Raedt AIJ 98)
![Page 72: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/72.jpg)
A logical decision tree
![Page 73: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/73.jpg)
Search in unknown environment
Popular Algorithms
• Uninformed LRTA* [Koenig]
• Depth-first
Kersting et al. Advanced Rob. 03
![Page 74: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/74.jpg)
74
Buildings
![Page 75: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/75.jpg)
75
Abstract states:
Abstract actions:
Abstract reward model:
Relational MDPs
![Page 76: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/76.jpg)
Policy
logical decision trees [Blockeel & De Raedt AIJ 98]
![Page 77: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/77.jpg)
77
Using a Relational Polichy
1. Start in State
2. Evaluate logical decision tree
3. Execute Action
4. Observe New State
5. Repeat if goal not yet reached.
![Page 78: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/78.jpg)
WARMR
![Page 79: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/79.jpg)
What to count ? Keys.
![Page 80: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/80.jpg)
Coverage
company(jvt,commercial).company(scuf,university).company(ucro,university).
course(cso,2,introductory).course(erm,3,introductory).course(so2,4,introductory).
course(srw,3,advanced).
H: “there is a subscription for a course of length less than 3”
participant(blake,president,jvt,yes,5).subscription(blake,cso).subscription(blake,erm).
?- subscription(K,C), course(K,C,L,_), (L < 3).
participant(turner,researcher,scuf,no,23).subscription(turner,erm).subscription(turner,so2).subscription(turner,srw).
K=blake C = cso L = 2
Yes
K=turnerNo
EBLAKE
KING
TURNERSCOTT
MILLER ADAMS
![Page 81: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/81.jpg)
Descriptive Data Mining
Selected,Preprocessed,
and Transformed Data
multi-relational database
association rules over multiple relations
“IF participant follows an advanced course THEN she skips the welcome party”
support: 20 %confidence: 75 %
“IF ?- participant(P,C,PA,X), course(P,Y,advanced)
THEN ?-PA = no.
support: 20 %confidence: 75 %
![Page 82: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/82.jpg)
WarmrFirst order association rule :
• IF Query1 THEN Query2
• Shorthand for
• IF Query1 THEN Query1 and Query2
• to obtain variable bindings
IF ?- participant(P,C,PA,X), course(P,Y,advanced) THEN PA=no
IF ?- participant(P,C,PA,X), course(P,Y,advanced) succeeds for P THEN ?- participant(P,C,PA,X), course(P,Y,advanced), PA=no
![Page 83: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/83.jpg)
Warmr ~ Apriori
Works as Apriori :
• keeping track of frequent and infrequent queries
• order queries by theta-subsumption
• using special mechanism (bias) to declare type of queries searched for
Generalizes many of the specific variants of Apriori : item-sets, episodes, hierarchies, intervals, ...
![Page 84: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/84.jpg)
Upgrading
Key ideas / contributions FOIL + Tilde + Warmr
• determine the representation of examples and hypotheses
• select the right type of coverage and generality (subsumption)
• keep existing algorithm (CN2 or C4.5) but replace operators (often theta-subsumption)
• keep search strategy
• fast implementation
![Page 85: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/85.jpg)
Distance Based Learning
![Page 86: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/86.jpg)
Distances
| h | = the size of a hypothesis h
| g | ≤ | s | if g is more general than s
mgg(t,u) = set of minimally general generalizations
| mgg(t,u) | = max size of element in mgg(t,u)
under mild conditions the following is a metric
![Page 87: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/87.jpg)
The RECIPERelevant for ALL levels of the hierarchy
Still being applied across data mining,
• mining from graphs, trees, and sequences
Works in both directions
• upgrading and downgrading !!!
Mining from graphs or trees as downgraded Relational Learning
Many of the same problems / solutions apply to graphs as to relational representations
![Page 88: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/88.jpg)
From Upgrading to Downgrading
Work at the right level of representation
• trade-off between expressivity & efficiency
The old challenge: upgrade learning techniques for simpler representations to richer ones.
The new challenge: downgrade more expressive ones to simpler ones for efficiency and scalability; e.g. graph miners.
Note: systems using rich representations form a baseline, and can be used to test out ideas.
Relevant also for ALL machine learning and data mining tasks
![Page 89: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/89.jpg)
Learning Tasks
Logical and relational representations can (and have been) used for all learning tasks and techniques
• rule-learning & decision trees
• frequent and local pattern mining
• distance-based learning (clustering & instance-based learning)
• probabilistic modeling (cf. statistical relational learning)
• reinforcement learning
• kernel and support vector methods
![Page 90: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/90.jpg)
C. Probabilistic Logic Learning
![Page 91: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/91.jpg)
PLL: What Changes ?•Clauses annotated with probability labels
• E.g. in Sato’s Prism, Muggleton’s SLPs, Kersting and De Raedt’s BLPs, …
• Prob. covers relation covers(e,H U B) = P(e | H,B)
• Likelihood of example given background and hypothesis
• Probability distribution P over the different values e can take; so far only (true,false)
• impossible examples have probability 0
• Knowledge representation issue
•Define probability distribution on examples / individuals
•What are these examples / individuals ?
![Page 92: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/92.jpg)
Probabilistic generative ILP Problem
Given
• a set of examples E
• a background theory B
• a language Le to represent examples
• a language Lh to represent hypotheses
• a probabilistic covers P relation on Le x Lh
Find
• hypothesis h* maximizing some score based on the probabilistic covers relation; often some kind of maximum likelihood
![Page 93: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/93.jpg)
PLL: Three Settings
• Probabilistic learning from interpretations
•Bayesian logic programs, Koller’s PRMs, Domingos’ MLNs, Vennekens’ LPADs
• Probabilistic learning from entailment
•Muggleton’s Stochastic Logic Programs, Sato’s Prism, Poole’s ICL, De Raedt et al.’s ProbLog
• Probabilistic learning from proofs
•Learning the structure of SLPs; a tree-bank grammar based approach, Anderson et al.’s RMMs, Kersting et al.
![Page 94: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/94.jpg)
C. Probabilistic Logic LearningC.1. Learning from Interpretations
![Page 95: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/95.jpg)
Graphical Model
• Defines a joint probability distribution in a compact way; attractive from a computational perspective
• The structure encodes some conditional independency assumptions.
• Two types :
• Directed -- Bayesian Network
• Undirected -- Markov Network
![Page 96: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/96.jpg)
Bayesian Networks
0.9 0.1e
b
e0.2 0.8
0.01 0.990.9 0.1
bebb
e
BE P(A | B,E)Earthquake
JohnCalls
Alarm
MaryCalls
Burglary
P(E,B,A,J,M) = P(E).P(B).P(A|E,B).P(J|A).P(M|A)
Using the Joint Prob. Distribution, any query P(Q |evidence) can be answered
![Page 97: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/97.jpg)
Learning
Two aspects:
• parameters, e.g. max likelihood
• structure learning
![Page 98: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/98.jpg)
Toy Example
![Page 99: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/99.jpg)
Parameter Estimation
A1 A2 A3 A4 A5 A6
true true false true false false
false true true true false false
... ... ... ... ... ...
true false false false true true
complete data setsimply counting
X1
X2
XM
...
![Page 100: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/100.jpg)
? missing value
hidd
en/
late
nt
Parameter Estimation
A1 A2 A3 A4 A5 A6
true true ? true false false
? true ? ? false false
... ... ... ... ... ...
true false ? false true ?
incomplete data set
Real-world data: states of some random variables are missing
– E.g. medical diagnose: not all
patient are subjects to all test
– Parameter reduction, e.g. clustering, ...
![Page 101: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/101.jpg)
On the Alarm Net
maximize likelihood:max P(E | M)
![Page 102: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/102.jpg)
Derivation (fully obs)
P (a,¬m) = !0(1! !2)
Generalize :
P (E|M) = !|a|0 .(1! !0)|¬a|.!|a!m|
1 (1! !1)|a!¬m|.!|¬a!m|2 .(1! !2)|¬a!¬m|
Use log likelihood:
L = log P (E|M) = |a|. log !0 + |¬a|. log(1! !0) +|a "m|. log !1 + |a " ¬m|. log(1! !1) +|¬a "m|. log !2 + |¬a " ¬m|. log(1! !2)
![Page 103: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/103.jpg)
Derivation (fully obs) (2)Derivatives are :
!L
!"0=
|a|"0! |¬a|
1! "0
!L
!"1=
|a "m|"1
! |a " ¬m|1! "1
!L
!"2=
|¬a "m|"2
! |¬a " ¬m|1! "2
Yielding as solutions
"0 =|a|
|a| + |¬a| (1)
"1 =|a "m|
|a "m| + |a " ¬m| (2)
"2 =|¬a "m|
|¬a "m| + |¬a " ¬m| (3)
So, simply counting !
![Page 104: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/104.jpg)
Parameter Estimation (part. obs)
Idea Expectation Maximization
Incomplete data ?
•Complete data (Imputation)
• most probable?, average?, ... value
•Count
•Iterate
![Page 105: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/105.jpg)
EM Idea: Complete the dataincomplete dataA M
true true
true ?
false true
true false
false ?
complete data
0.8
1.2
1.4
1.6
N
falsefalse
truefalse
falsetrue
truetrue
MA
expected counts
complete
iterate
P (m|a) = 0.6
P (m|¬a) = 0.2
!2 = P (m|¬a) =1.2
1.2 + 0.8= 0.6
!1 = P (m|a) =1.6
1.6 + 0.4= 0.54
P (m|¬a) = 0.2
P (m|a) = 0.6
![Page 106: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/106.jpg)
Parameter Tying
P (E|M) = !|a|0 .(1! !0)|¬a|.!|a!m|
1 (1! !1)|a!¬m|.!|¬a!m|2 .(1! !2)|¬a!¬m|.
!|a!j|3 .(1! !3)|a!¬j|.!|¬a!j|
4 (1! !4)|¬a!¬j|
Assume !1 = !3 and !2 = !4. The likelihood simplifies to
P (E|M) = !|a|0 .(1! !0)|¬a|.!|a!m|+|a!j|
1 (1! !1)|a!¬m|+|a!¬j|.
!|¬a!m|+|¬a!j|2 .(1! !2)|¬a!¬m|+|¬a!¬m|
![Page 107: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/107.jpg)
Structure Learning
Of Bayesian networks
• one approach greedily searches through the space of possible networks:
• iterate
• perform operation on graph (add, delete, reverse arcs)
• estimate parameters
• evaluate quality
![Page 108: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/108.jpg)
Bayesian Networks
0.9 0.1e
b
e0.2 0.8
0.01 0.990.9 0.1
bebb
e
BE P(A | B,E)Earthquake
JohnCalls
Alarm
MaryCalls
Burglary
P(E,B,A,J,M) = P(E).P(B).P(A|E). P(A|B).P(J|A).P(M|A)
earthquake.
burglary.
alarm :- earthquake, burglary.
marycalls :- alarm.
johncalls:- alarm.
![Page 109: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/109.jpg)
Probabilistic Relational Models (PRMs)
PersonBloodtype
M-chromosomeP-chromosome
Person
Bloodtype M-chromosome
P-chromosome
(Father)
Person
Bloodtype M-chromosome
P-chromosome
(Mother)
Table
[Getoor,Koller, Pfeffer]
[Getoor,Koller, Pfeffer]
![Page 110: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/110.jpg)
Probabilistic Relational Models (PRMs)
bt(Person,BT).
pc(Person,PC).
mc(Person,MC).
bt(Person,BT) :- pc(Person,PC), mc(Person,MC).
pc(Person,PC) :- pc_father(Father,PCf), mc_father(Father,MCf).
pc_father(Person,PCf) | father(Father,Person),pc(Father,PC)....
father(Father,Person).
PersonBloodtype
M-chromosomeP-chromosome
Person
Bloodtype M-chromosome
P-chromosome
(Father)
Person
Bloodtype M-chromosome
P-chromosome
(Mother)
View :
Dependencies (CPDs associated with):
mother(Mother,Person).[Getoor,Koller, Pfeffer]
![Page 111: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/111.jpg)
Probabilistic Relational Models (PRMs)Bayesian Logic Programs (BLPs)
father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry).
bt(Person,BT) | pc(Person,PC), mc(Person,MC).pc(Person,PC) | pc_father(Person,PCf), mc_father(Person,MCf).mc(Person,MC) | pc_mother(Person,PCm), pc_mother(Person,MCm).
mc(rex)
bt(rex)
pc(rex)mc(ann) pc(ann)
bt(ann)
mc(fred) pc(fred)
bt(fred)
mc(brian)
bt(brian)
pc(brian)mc(utta) pc(utta)
bt(utta)
mc(doro) pc(doro)
bt(doro)
mc(henry)pc(henry)
bt(henry)
RV State
pc_father(Person,PCf) | father(Father,Person),pc(Father,PC)....
Extension
Intension
![Page 112: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/112.jpg)
Knowledge Based Model Construction
Extension + Intension =>Probabilistic Model
Advantages
same intension used for multiple extensions
parameters are being shared / tied together
unification is essential
•learning becomes feasible
• Typical use includes
•prob. inference P(Q | E), P(bt(mary) | bt(john) =o-)
•max. likelihood parameter estimation & structure learning
![Page 113: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/113.jpg)
Answering Queries
mc(rex)
bt(rex)
pc(rex)mc(ann) pc(ann)
bt(ann)
mc(fred) pc(fred)
bt(fred)
mc(brian)
bt(brian)
pc(brian)mc(utta) pc(utta)
bt(utta)
mc(doro) pc(doro)
bt(doro)
mc(henry)pc(henry)
bt(henry)
P(bt(ann)) ?
![Page 114: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/114.jpg)
Answering Queries
mc(rex)
bt(rex)
pc(rex)mc(ann) pc(ann)
bt(ann)
mc(fred) pc(fred)
bt(fred)
mc(brian)
bt(brian)
pc(brian)mc(utta) pc(utta)
bt(utta)
mc(doro) pc(doro)
bt(doro)
mc(henry)pc(henry)
bt(henry)
P(bt(ann), bt(fred)) ?
P(bt(ann)| bt(fred)) =P(bt(ann),bt(fred))
P(bt(fred))
Bayes‘ rule
![Page 115: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/115.jpg)
Combining Partial Knowledge
passes/1
read/1
prepared/2
discusses/2...
bn
passes
Studentprepared
logic
prepared
passes(Student) | prepared(Student,bn),
prepared(Student,logic).
prepared
Studentread
discussesBook
Topic
prepared(Student,Topic) | read(Student,Book),
discusses(Book,Topic).
![Page 116: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/116.jpg)
Combining Partial Knowledge
prepared(s1,bn)
discusses(b1,bn)
prepared(s2,bn)
discusses(b2,bn)
• variable # of parents for prepared/2 due to read/2
• whether a student prepared a topic depends on the books she read
• CPD only for one book-topic pair
prepared(Student,Topic) | read(Student,Book),
discusses(Book,Topic).
prepared
Studentread
discussesBook
Topic
![Page 117: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/117.jpg)
Combining Rules
P(A|B,C)
P(A|B) and P(A|C)
CR
Any algorithm which has an empty output if and only if the input is empty
combines a set of CPDs into a single (combined) CPDE.g. noisy-or, regression, ...
prepared
Studentread
discussesBook
Topic
prepared(Student,Topic) | read(Student,Book),
discusses(Book,Topic).
![Page 118: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/118.jpg)
Aggregates
student_ranking/1
registration_grade/2
...registered/2
Map multisets of values to summary values (e.g., sum, average, max, cardinality)
![Page 119: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/119.jpg)
AggregatesMap multisets of values to summary values (e.g., sum, average, max, cardinality)
student_ranking
Student
grade_avg
Probabilistic Dependency
(CPD)student_ranking/1
grade_avg/1
registration_grade/2
...registered/2
grade_avg
Student
registration_grade
FunctionalDependency
(average)
registered/2Course
Deterministic grade_avg/1
![Page 120: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/120.jpg)
Bayesian Logic Programs
% apriori nodesnat(0).
% aposteriori nodesnat(s(X)) | nat(X).
nat(0) nat(s(0)) nat(s(s(0)) ...MC
% apriori nodesstate(0).
% aposteriori nodesstate(s(Time)) | state(Time).output(Time) | state(Time)
state(0)
output(0)
state(s(0))
output(s(0))
...HMM
% apriori nodesn1(0).
% aposteriori nodesn1(s(TimeSlice) | n2(TimeSlice).n2(TimeSlice) | n1(TimeSlice).n3(TimeSlice) | n1(TimeSlice), n2(TimeSlice).
n1(0)
n2(0)
n3(0)
n1(s(0))
n2(s(0))
n3(s(0))
...DBN
pure P
rolog
Prolog and Bayesian Nets as Special Case
![Page 121: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/121.jpg)
RVs + States = (partial) Herbrand interpretationProbabilistic learning from interpretations
Learning: the data
Family(1)pc(brian)=b,
bt(ann)=a,
bt(brian)=?,
bt(dorothy)=a
Family(2)
bt(cecily)=ab,
pc(henry)=a,
mc(fred)=?,
bt(kim)=a,
pc(bob)=b
Background
m(ann,dorothy),
f(brian,dorothy),
m(cecily,fred),
f(henry,fred),
f(fred,bob),
m(kim,bob),
...Family(3)
pc(rex)=b,
bt(doro)=a,
bt(brian)=?
![Page 122: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/122.jpg)
Parameter Estimation
+
![Page 123: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/123.jpg)
Parameter Estimation
+
Parameter tying
![Page 124: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/124.jpg)
Expectation Maximization
Initial Parameters q0
Logic Program L
Expected counts of a clause
Update parameters (ML, MAP)
Maximization
EM-algorithm:iterate until convergence
Current Model(M,qk)
P( head(GI), body(GI) | DC )MM
DataCaseDC
Ground InstanceGIP( head(GI), body(GI) | DC )MM
DataCaseDC
Ground InstanceGI
P( body(GI) | DC )MM
DataCaseDC
Ground InstanceGI
![Page 125: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/125.jpg)
Structure Learning
mc(X) | m(M,X), mc(M), pc(M).pc(X) | f(F,X), mc(F), pc(F).
bt(X) | mc(X), pc(X).
Original program
mc(john) pc(john)
bc(john)
m(ann,john) f(eric,john)
pc(ann)
mc(ann) mc(eric)
pc(eric)
mc(X) | m(M,X),pc(X). pc(X) | f(F,X).
bt(X) | mc(X), pc(X).
Refinement
mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X).
Initial hypothesis
mc(X) | m(M,X). pc(X) | f(F,X).
bt(X) | mc(X), pc(X).
Refinement
![Page 126: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/126.jpg)
Bayesian Logic Programs (BLPs)
• Unique probability distribution over Herbrand interpretations
• Finite branching factor, finite proofs, no self-dependency
• Highlight
• Separation of qualitative and quantitative parts
• Functors
• Graphical Representation
• Discrete and continuous RV
• BNs, DBNs, HMMs, SCFGs, Prolog ...
• Turing-complete programming language
• Learning
![Page 127: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/127.jpg)
Balios Tool
![Page 128: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/128.jpg)
Undirected Probabilistic Relational
So far, directed graphical models only
Impose acyclicity constraint
Undirected graphical models do not impose the acyclicity constraint
![Page 129: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/129.jpg)
Undirected Probabilistic Relational
Two approaches
• Relational Markov Networks (RMNs) (Taskar et al.)
• Markov Logic Networks (MLNs) (Anderson et al.)
Idea
• Semantics defined in terms of Markov Networks (undirected graphical models)
• More natural for certain applications
RMNs ~ undirected PRM
MLNs ~ undirected BLP
![Page 130: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/130.jpg)
A Markov Network/Random Field
![Page 131: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/131.jpg)
Markov Logic
(slide and work by Domingos et al.)
![Page 132: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/132.jpg)
Markov Logic
(slide and work by Domingos et al.)
![Page 133: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/133.jpg)
Markov Logic
(slide and work by Domingos et al.)
![Page 134: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/134.jpg)
Markov Logic
(slide and work by Domingos et al.)
![Page 135: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/135.jpg)
Some interesting problems
• Inference is generally intractable
• Most Probable State Estimation
• Lazy Max Walk Sat algorithm
![Page 136: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/136.jpg)
MLNs
• Probably the best known and most elaborated approach to SRL
• Many different inference & learning algorithms known; discriminative, generative, structure learning, ...
• See website Pedro Domingos and Alchemy system
![Page 137: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/137.jpg)
C. Probabilistic Logic LearningC.2. Learning from proofs & entailment
![Page 138: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/138.jpg)
Probabilistic Context Free Grammars
1.0 : S -> NP, VP1.0 : NP -> Art, Noun0.6 : Art -> a0.4 : Art -> the0.1 : Noun -> turtle0.1 : Noun -> turtles…0.5 : VP -> Verb0.5 : VP -> Verb, NP0.05 : Verb -> sleep0.05 : Verb -> sleeps….
The turtle sleeps
Art Noun Verb
NP VP
S
1
1 0.5
0.4 0.1 0.05
P(parse tree) = 1x1x.5x.1x.4x.05
![Page 139: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/139.jpg)
PCFGsP (parse tree) =
!i pici
where pi is the probability of rule iand ci the number of timesit is used in the parse tree
P (sentence) ="
p:parsetree P (p)
Observe that derivations always succeed, that isS ! T,Q and T ! R,Ualways yieldsS ! R,U, Q
![Page 140: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/140.jpg)
Probabilistic DCG
1.0 S -> NP(Num), VP(Num)1.0 NP(Num) -> Art(Num), Noun(Num)0.6 Art(sing) -> a0.2 Art(sing) -> the0.2 Art(plur) -> the0.1 Noun(sing) -> turtle0.1 Noun(plur) -> turtles…0.5 VP(Num) -> Verb(Num)0.5 VP(Num) -> Verb(Num), NP(Num)0.05 Verb(sing) -> sleep0.05 Verb(plur) -> sleeps….
The turtle sleeps
Art(s) Noun(s) Verb(s)
NP(s) VP(s)
S
1
1 0.5
0.2 0.1 0.05
P(derivation tree) = 1x1x.5x.1x .2 x.05
![Page 141: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/141.jpg)
As a Stochastic Logic Program
11
1/2
P(s([the,turtles,sleep],[])=1/6
1
1/3
1/2
![Page 142: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/142.jpg)
Probabilistic DCG
1.0 S -> NP(Num), VP(Num)1.0 NP(Num) -> Art(Num), Noun(Num)0.6 Art(sing) -> a0.2 Art(sing) -> the0.2 Art(plur) -> the0.1 Noun(sing) -> turtle0.1 Noun(plur) -> turtles…0.5 VP(Num) -> Verb(Num)0.5 VP(Num) -> Verb(Num), NP(Num)0.05 Verb(sing) -> sleep0.05 Verb(plur) -> sleeps….
The turtle sleeps
Art(s) Noun(s) Verb(s)
NP(s) VP(s)
S
1
1 0.5
0.2 0.1 0.05
P(derivation tree) = 1x1x.5x.1x .2 x.05
What about “A turtles sleeps” ?
![Page 143: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/143.jpg)
SLPsPd(derivation) =
!i pci
iwhere pi is the probability of rule iand ci the number of timesit is used in the parse tree
Observe that some derivations now fail due to unification, that isnp(Num)! art(Num), noun(Num) and art(sing)! anoun(plural)! turtles
Normalization necessaryPs(proof) = Pd(proof)P
i Pd(proofi)
![Page 144: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/144.jpg)
Example Application• Consider traversing a university website• Pages are characterized by predicate
department(cs,nebel) denotes the page of cs following the link to nebel
• Rules applied would be of the form department(cs,nebel) :- prof(nebel), in(cs), co(ai), lecturer(nebel,ai). pagetype1(t1,t2) :- type1(t1), type2(t2), type3(t3), pagetype2(t2,t3)
• SLP models probabilities over traces / proofs / web logs department(cs,nebel), lecturer(nebel,ai007),
course(ai007,burgard), …This is actually a Logical Markov Model
![Page 145: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/145.jpg)
0.05
0.1
Lect.
Dept
Course
0.7
0.5
0.4
0.1
0.25
0.5
0.4
0.4 department(D,L) :-prof(L), in(D),co(C), lecturer(L,C).
0.1 prof(nebel).0.05 prof(burgard).…
Logical Markov Model
![Page 146: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/146.jpg)
Learning SLPs from entailment
Entai
lmen
t Given• a set of ground facts
• s([the, boy, rides]) • structure of the logic program Find • the parameters of the logic program that maximize the log likelihood
Parameter estimation• Failure adjusted maximization algorithm (Cussens)
• Only facts observed• Proofs and Failures not observed !
• Inside-outside algorithm for Prism (Sato) • EM based techniques
CFG
sentences
PCFG
no failures
![Page 147: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/147.jpg)
Learning SLPs from entailment
Entai
lmen
t
CFG
sentences
PCFG
no failures
Given• a set of ground facts• the structure of the logic programFind • the logic program and its parameters that maximizes the log
likelihood
Structure learning• requires one to solve the underlying relational learning problem• as well as the parameter estimation problem• Very hard because the information contained in the examples is very
limited.• What about using richer examples ? proofs ?
![Page 148: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/148.jpg)
Learning SLPs from entailment
Entai
lmen
t
CFG
sentences
PCFG
no failures
Given• a set of proof trees• the structure of the logic programFind • the logic program and its parameters that maximizes the log
likelihood
Structure learning• analogy with learning from tree banks, cf. De Raedt et al. AAAI 05.
![Page 149: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/149.jpg)
Learning from parse tree
S -> NP(s), VP(s)NP(s) -> Art(s),
Noun(s)VP(s) -> Verb(s) Art(s) -> theNoun(s) -> turtleVerb(s) -> sleeps
The turtle sleeps
Art(s) Noun(s) Verb(s)
NP(s) VP(s)
S
How to get the variables back ?
![Page 150: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/150.jpg)
Generalizing Rules in SLPs
Generalization in ILP
• Take two clauses for same predicate and replace them by the lgg under theta-subsumption (Plotkin), e.g
department(cs,nebel) :-
prof(nebel), in(cs), course(ai), lect(nebel,ai).
department(cs,burgard) :-
prof(burgard), in(cs),course(ai), lect(burgard,ai)
• Induce
department(cs,P) :-
prof(P), in(cs),course(ai), lect(P,ai)
![Page 151: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/151.jpg)
Strong logical constraints
Replacing the rules r1 and r2 by the lgg should preserve the proofs !
So, two rules r1 and r2 should only be generalized when
• There is a one to one mapping (with corresponding substitutions) between literals in r1, r2 and lgg(r1,r2)
Exclude
• father(j,a) :- m(j),f(a),parent(j,a)
• father(j,t) :- m(j),m(t), parent(j,t)
![Page 152: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/152.jpg)
Protein Fold Recognition
• If one of the two similar proteins has a known structure, can build a rough model of the protein of
unknown structure.
[Kersting et al.; Kersting, Gaertner]
[helix(h(right,3to10),5),
helix(h(right,alpha),13), strand(null,7), strand(minus,7), strand(minus,5),
helix(h(right,3to10),5),…]
![Page 153: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/153.jpg)
LoHMM
~120 parameters
vs.
over 62000 parameters
Secondary structure of domains of proteins
(from PDB and SCOP)fold1: TIM beta/alpha barrel fold, fold2: NAD(P)-binding Rossman-fold fold23: Ribosomal
protein L4, fold37: glucosamine 6-phosphate deaminase/isomerase old fold55: leucine aminopeptidas fold. 3187 logical sequences (> 30000 ground atoms)
[Kersting et al., JAIR 2006]
![Page 154: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/154.jpg)
Results
• Accuracy: 74% vs. 82.7% (1622 vs. 1809 / 2187)
• Majority vote: 43%
fold1 fold2 fold23 fold37 fold55
precision 0.86 / 0.89 0.69 / 0.86 0.56 / 0.82 0.72 / 0.70 0.66 / 0.74
recall 0.78 / 0.87 0.67 / 0.81 0.71 / 0.85 0.66 / 0.72 0.96 / 0.86
[Kersting et al.; Kersting, Gaertner; Landwehr et al.]
![Page 155: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/155.jpg)
Web Log Data
Relational Markov Models
Log data of web sides
KDDCup 200 (www.gazelle.com)
RMM
[Anderson et al., KDD 03]
![Page 156: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/156.jpg)
Probabilities on Proofs
Two views
• stochastic logic programs define a prob. distribution over atoms for a given predicate.
• The sum of the probabilities = 1.
• Sampling. Like in probabilistic grammars.
• distribution semantics define a prob. distribution over possible worlds/interpretations. Degree of belief.
![Page 157: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/157.jpg)
• Sato’s PRISM and Poole’s ICL and PHA use disjoint statements (or multi-valued switches)
disjoint(p1:a1, ... , pk:ak)
disjoint(0.5:head(C); 0.5:tail(C))
head(C) and tail(C) are mutually exclusive
false :- head(C), tail(C)
• for such statements: one of the alternatives is true, instead of true or false for a single fact.
Probabilities on Facts
![Page 158: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/158.jpg)
PRISM (Sato) / ICL (Poole)
A logic program in which probability labels are attached to facts;
Clauses carry no probability label (or equiv. P =1)
Probability distributions can be defined in a related/similar fashion on proofs / on explanations / on atoms - though some differences
Abductive reasoning with probabilities !
PRISM and ICL more expressive than SLPs
• SLPs can be easily transformed into ICL or PRISM
![Page 159: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/159.jpg)
ICL / PRISM examplebtype(‘A’) : - (gtype(a,a); gtype(a,o); gtype(o,a)).
btype(‘B’) : - (gtype(b,b); gtype(b,o); gtype(o,b)).
btype(‘O’) : - gtype(o,o).
btype(‘AB’) : - (gtype(a,b); gtype(b, a)).
gtype(X,Y) :- gene(father,X), gene(mother,Y).
gene(P,G) :- msw(gene,P,G). probabilistic switch
disjoint( p1: msw(gene,P,o),
p2: msw(gene,P,a),
p3: msw(gene,P,b)) outcomes + probabilities switch
Infer e.g. P(btype(‘A’))
![Page 160: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/160.jpg)
Abductionbtype(‘A’) iff gtype(a,a) or gtype(a,o) or gtype(o,a)
gtype(a,a) iff msw(gene,father,a) and msw(gene,mother,a)
gtype(a,o) iff msw(gene,father,a) and msw(gene,mother,o)
gtype(o,a) iff msw(gene,father,o) and msw(gene,mother,a)
So, btype(‘A’) = p1xp1 + p1Xp2+p2Xp1
Can be used for sampling as well.
![Page 161: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/161.jpg)
• The Exclusive Explanation Assumption :
• avoid the NP-complete problem by assuming that all explanations (that is proofs, or sets S ⊆ F, are mutually exclusive)
• under this condition (which can be enforced in different ways)
• P(A v B v C) = P(A) + P(B) + P(C)
ICL/PRISM Assumption
![Page 162: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/162.jpg)
Learning from entailment
Given
• examples of the form btype(x)
• structure of program
Estimate parameters of switches
Nice results -- HMMs, PCFGs, some type of BNs as special case
Sato’s Prism one of the most advanced PLL systems
Has been used by Cussens to learn SLPs
![Page 163: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/163.jpg)
Learning from Entailment and Proofs
SLPs extend PCFGs as a representation
Proof-trees for SLPs correspond to parse-trees in PCFGs
Upgrading the learning from tree-banks setting for use in SLPs
Allows one also to elegantly model and study RMMs and LOHMMs
• Sequential relational / logical traces.
Defining probabilities on facts provides interesting perspective
• PRISM and ICL
• Probabilistic Abduction
• Also learning ...
![Page 164: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/164.jpg)
C. Probabilistic Logic LearningC.3. Discriminative models &
propositionalization
![Page 165: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/165.jpg)
Probabilistic discriminative ILP Problem
Given
• a set of examples E, positive and negatives.
• a background theory B
• a language Le to represent examples
• a language Lh to represent hypotheses
• a probabilistic covers P relation on Le x Lh
Find
• hypothesis h* maximizing some score based on the probabilistic covers relation; often some kind of conditional likelihood
![Page 166: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/166.jpg)
Propositionalization
att
att
att
att att example
exampleexampleexampleexampleexampleexampleexampleexample
single-table single-tuple
attribute-value
att
att
att
att
att exampl
eexampl
e
example
single-table multiple-tuplemulti-instance
att
att
att
att
att
att
att
example
example
example
example
example
example
example
exampexa
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
multi-table multiple-tuple
relational
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
att
att
att
att
att
att
att
examp
examp
exam
exam
exa
exa
exexe
2 relationsedge / vertex
graphs & networks
Downgrading the data ?
![Page 167: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/167.jpg)
Two questions
UPGRADING : Can we develop systems that work with richer representations (starting from systems for simpler representations)?
PROPOSITIONALISATION: Can we change the representation from richer representations to simpler ones ? (So we can use systems working with simpler representations)
![Page 168: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/168.jpg)
Relational Data and Clausal Theories
![Page 169: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/169.jpg)
Relational Data and Clausal Theories
Clauses match on structural features of
e.g. moleculesSet of clauses is called theory or hypothesis
![Page 170: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/170.jpg)
Clauses , ,
Propositionalization
![Page 171: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/171.jpg)
Logical TheoriesTheory can be seen as a set of features
Disjunction: example positive if one of the clauses matches (hypothesis c1 v ... v ck)
![Page 172: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/172.jpg)
How to learn H?• Traditional static propositionalization approaches
a. ILP + SVM 1. Run ILP system to generate small set of clauses2. Train SVM based on these features
c. Pattern Mining + SVM 1. Generate large set of clauses, e.g., all clauses that are frequent in the
training data2. Train SVM based on these features
• Problem: Two steps are independent– we will not find the hypothesis maximizing our score– Theory that yields optimal similarity measure different from
best ILP theory!– Information loss
![Page 173: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/173.jpg)
nFOIL: Clauses + Probabilistic Model
• nFOIL / tFOIL: combine clauses using a (tree augmented) naive Bayes model
• Defines distribution over class variable for a given example ( substitution, instantiates to example)
Instantiated class predicate Instantiated clauses
![Page 174: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/174.jpg)
kFOIL: Clauses + Kernel• kFOIL: combine clauses using a kernel machine • Idea: Theory H measures similarity
– e.g., standard dot product kernel
– polynomial kernel
– Gaussian kernel
• Classification can be obtained from similarity by kernel machine
„number of features examples have in common“
„conjunctions of up to p features“
„symmetric difference: number of features which hold on only one example“
![Page 175: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/175.jpg)
How to learn H?• Looking for a hypothesis which is
– the best feature set in a naive Bayes model (nFOIL)– defines the best similarity measure (kFOIL)
• The optimization criterion is different from standard ILP• Use a score for clause search that takes into account the
way the clauses are combined– For nFOIL / tFOIL: Conditional likelihood (probabilistic coverage)
– For kFOIL: Performance of the kernel machine on the training data
![Page 176: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/176.jpg)
The Score Function• Score of a theory
– Theory defines kernel – Parameters are obtained by standard SVM training – Given parameters, examples can be labeled according to
– Most simple choice for : Accuracy/RMSE
(classification)
(regression)
![Page 177: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/177.jpg)
How to learn H?• Dynamic propositionalization [Landwehr et. al. JMLR 06]
– Integrate the two steps– Modify ILP algorithm: search for clauses guided by performance
of the combination method (NB/SVM)
• FOIL-style search – Greedy general-to-specific search for clauses employing refinement
operator
– Greedy incremental search for clause set
![Page 178: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/178.jpg)
FOIL
mutagenic(X) :- atom(X,A,n),charge(A,0.82)
:- true
Coverage = 0.7
Coverage = 0.6
Coverage = 0.6
:- atom(X,A,c)
:- atom(X,A,n)
:- atom(X,A,f)
Coverage = 0.8
Coverage = 0.6
:- atom(X,A,c),bond(A,B)
:- atom(X,A,c),charge(A,0.82)
mutagenic(X) :- atom(X,A,c),bond(A,B)
![Page 179: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/179.jpg)
Experimental Results: Classification
40
55
70
85
100
Alz
heim
er a
min
e
Alz
heim
er to
xic
Alz
heim
er a
cety
l
Alz
heim
er m
emor
y
NC
TRER
Mut
agen
esis
r.f.
Mut
agen
esis
r.u.
Acc
urac
y
Dataset
kFOIL nFOIL Alephc-ARMR+SVM
Comparison against Aleph (ILP) and c-ARMR+SVM (static propositionalization)
on several ILP benchmark datasets
![Page 180: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/180.jpg)
Propositionalization• Dynamic propositionalization can yield more accurate
models than ILP or static propositionalization– key idea: guide clause search by score related to combination method– typically yields clause sets which are better suited to the particular
combination method• Added computational complexity if a costly clause
combination method (such as SVM) is used– However, complexity can be reduced by incrementally refining the
combination model when the clause set changes– This brings dynamic propositionalization computationally close to ILP
(on the datasets we looked at)
![Page 181: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/181.jpg)
A flavor of Probabilistic ILP or Statistical Relational Learning from a logical perspective
A definition of Probabilistic ILP
• based on a probabilistic coverage notion and annotated logic programs
Different Probabilistic ILP setting as for traditional ILP
• Learning from entailment (Parameter Est. SLPs/Prism)
• Learning from interpretations (BLPs, PRMs & Co)
• Learning from traces or proofs (SLPs, RMMs, LOHMMs)
• Discriminative ILP (FOIL + Naïve Bayes or Kernels)
Conclusions
![Page 182: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/182.jpg)
Probabilistic Logic Learning
Excellent Area for Research & Ph.D. topics !
![Page 183: An Introduction to Statistical Relational Learningluc.deraedt/salvador.pdf · An Introduction to Statistical Relational Learning Luc De Raedt ... *In the US, sometimes called Statistical](https://reader031.vdocuments.site/reader031/viewer/2022013022/5f43c739f080e96d64277da5/html5/thumbnails/183.jpg)
THANK YOU !