relational learning with gaussian processes
DESCRIPTION
Relational Learning with Gaussian Processes. By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented by Nesreen Ahmed, Nguyen Cao, Sebastian Moreno, Philip Schatz. Outline. Introduction Relational Gaussian Processes Application - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/1.jpg)
Relational Learning with Gaussian Processes
By Wei Chu, Vikas Sindhwani, Zoubin
Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!)
Presented byNesreen Ahmed, Nguyen Cao ,
Sebastian Moreno, Philip Schatz
![Page 2: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/2.jpg)
Outline
• Introduction• Relational Gaussian Processes• Application
– Linkage prediction– Semi-Supervised Learning
• Experiments & Results• Conclusion & Discussion
12/02/08CS590M: Statistical Machine Learning - Fall 2008 2
![Page 3: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/3.jpg)
Introduction
• Many domains involve Relational Data– Web: document links– Document Categorization: citations – Computational Biology: protein interactions
• Inter-relationships between instances can be informative for learning tasks
• Relations reflect network structure, enrich how instances are correlated
12/02/08CS590M: Statistical Machine Learning - Fall 2008 3
![Page 4: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/4.jpg)
Introduction• Relational Information represented by a graph
G = (V, E)• Supervised Learning:
– Provide structural knowledge• Also for semi-supervised: derived from input
attributes.• Graph estimates the global geometric structure
of the data
12/02/08CS590M: Statistical Machine Learning - Fall 2008 4
![Page 5: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/5.jpg)
• A Gaussian Process is a joint Gaussian distribution over sets of function values {fx} of any arbitrary set of n instances x
Gaussian Processes
12/02/08CS590M: Statistical Machine Learning - Fall 2008 5
fffP T
n1
2/12/ 21
expdet2
1)(
f f x1,..., fxn Twhere
(x i,x j ) n,n
![Page 6: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/6.jpg)
• Linkages:
• The uncertainty in observing εij induces Gaussian noise N(0, σ2) in observing the values of the corresponding instances’ function value
Relational Gaussian Processes
12/02/08CS590M: Statistical Machine Learning - Fall 2008 6
xjxiεij
tied"negatively" are xand xif 1
tied"positively" are xand xif 1
ji
ji
ij
z
xxxxxxij
dz
ffffffP jiji
ji
1,0|)(
11),|1(
![Page 7: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/7.jpg)
• Approximate Inference:
Relational Gaussian Processes
12/02/08CS590M: Statistical Machine Learning - Fall 2008 7
P( f |) 1
P P f P ij | fxi , fx j
ij
Q( f )P( f ) sij exp(1
2f ijT ij f ij )
ij
ij
i,j runs over the set of observed undirected linkages
EP algorithm approximates
where
P f P ij | fxi , fx j ij
as :
is a 2x2 symmetric matrix
![Page 8: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/8.jpg)
Relational Gaussian Processes
12/02/08CS590M: Statistical Machine Learning - Fall 2008 8
P( f |) (0,)
1 1
~
ij
ij
where
~
ij is a nxn matrix with four non-zero entries augmented from
ij
![Page 9: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/9.jpg)
• For any finite collection of data points X, the set of random variables {fx} conditioned on ε have a multivariate Gaussian distribution:
Relational Gaussian Processes
12/02/08CS590M: Statistical Machine Learning - Fall 2008 9
)~,0()|( fP
where elements of covariance matrix are given by evaluating the following (covariance) kernel function:
zTx kIkzxKzxK 1)(),(),(~
![Page 10: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/10.jpg)
Linkage Prediction
• Joint prob. • Probability for an edge between Xr and Xs
12/02/08CS590M: Statistical Machine Learning - Fall 2008 10
)~,0;()|( rsrsrssr fNfPXandX
)arcsin(21
)|(
)~,0;()|()|(
rsrs
XsXrrsrsrsrsidealrs
P
dfdffNfPP
),(~),(~),(~
rrss
sr
xxKxxK
xxK
![Page 11: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/11.jpg)
Semi supervised learning
12/02/08CS590M: Statistical Machine Learning - Fall 2008 11
?
? ?
??
?
?
?
-11
1
?
-1
![Page 12: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/12.jpg)
Semi supervised learning
12/02/08CS590M: Statistical Machine Learning - Fall 2008 12
?
? ?
??
?
?
?
-11
1
?
-1
Nearest Neighborhood K=1
![Page 13: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/13.jpg)
Semi supervised learning
12/02/08CS590M: Statistical Machine Learning - Fall 2008 13
?
? ?
??
?
?
?
-11
1
?
-1
Nearest Neighborhood K=2
![Page 14: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/14.jpg)
Semi supervised learning
• Apply RGP to obtain• Variables are related through a
Probit noise
• Applying Bayes
12/02/08CS590M: Statistical Machine Learning - Fall 2008 14
)|( lfP
lll yandzf ,,
n
zllzll
fyfyP
)|(
),(
)|()|()|(
1),|(
CN
fyPfPyP
yfPl
zllll
![Page 15: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/15.jpg)
Semi supervised learning
• Predictive distribution
• Obtaining Bernoulli distribution for classification
12/02/08CS590M: Statistical Machine Learning - Fall 2008 15
),(),|( 2ttzt NyfP
Ttlll
Ttttt
ltt
kCkzz
k
)~~~(),(~
~
1112
1
),0(),(),|(
),|0(),|(
22
22
ntt
tn
tt
NNyXP
yXPyyP
![Page 16: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/16.jpg)
Experiments
• Experimental Setup– Kernel function
• Centralized Kernel : linear or Gaussian kernel shifted to the empirical mean
– Noise level • Label noise = 10-4 (for RGP and GPC)• Edge noise = [5 : 0.05]
12/02/08CS590M: Statistical Machine Learning - Fall 2008 16
2n2
2
2exp),( zxzxK
i j
iji
ii
i xxKn
xzKn
xxKn
zxK ),(1
),(1
),(1
),( 2
![Page 17: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/17.jpg)
12/02/08CS590M: Statistical Machine Learning - Fall 2008 17
30 Samples collected from a gaussian mixture with two components on the x-axis. Two labeled samples indicated by diamond and circle.K=3
Best value =0.4 based on approximate model evidence
Results
![Page 18: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/18.jpg)
12/02/08CS590M: Statistical Machine Learning - Fall 2008 18
Posterior Covariance matrix of RGP learnt from the data
It captures the density information of unlabelled data
Using the posterior covariance matrix learnt from the data as the new prior, supervised learning is carried out
Curves represent predictive distribution for each class
Results
![Page 19: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/19.jpg)
Results• Real World Experiment
– Subset of the WEBKB dataset • Collected from CS dept. of 4 universities• Contains pages with hyperlinks interconnecting them• Pages classified into 7 categories (e.g student, course, other)
– Documents are preprocessed as vectors of input attributes
– Hyperlinks translated into undirected positive linkages• 2 pages are likely to be positively correlated if hyperlinked by
the same hub page • No negative linkages
– Compared with GPC & LapSVM (Sindhwani et al. 2005)
12/02/08CS590M: Statistical Machine Learning - Fall 2008 19
![Page 20: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/20.jpg)
Results
• Two classification tasks– Student vs. non-student, Other vs. non-other
• Randomly selected 10% samples as labeled data• Selection repeated 100 times• Linear kernel• Table shows average AUC for predicting the labels of unlabeled cases
12/02/08CS590M: Statistical Machine Learning - Fall 2008 20
Student or Not Other or Not
Univ. GPC LapSVM RGP GPC LapSVM RGP
Corn.0.825±0.01
60.987±0.00
80.989±0.00
90.708±0.02
10.865±0.03
80.884±0.02
5
Texa.0.899±0.01
60.994±0.00
70.999±0.00
10.799±0.02
10.932±0.02
60.906±0.02
6
Wash.0.839±0.01
80.957±0.01
40.961±0.00
90.782±0.02
30.828±0.02
50.877±0.02
4
Wisc0.883±0.01
30.976±0.02
90.992±0.00
80.839±0.01
40.812±0.03
00.899±0.01
5
![Page 21: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/21.jpg)
Conclusion• A novel Bayesian framework to learn from relational
data based on GP
• The RGP provides a data-dependent covariance function for supervised learning tasks (classification)
• Applied to semi-supervised learning tasks
• RGP requires very few labels to generalize on unseen test points– Incorporate unlabeled data in the model selection
12/02/08CS590M: Statistical Machine Learning - Fall 2008 21
![Page 22: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/22.jpg)
Discussion• The proposed framework can be extended to
model:– Directed (asymmetric) relations as well as
undirected relations – Multiple classes of relations– Graphs with weighted edges
• The model should be compared to other models
• The results can be sensitive to choice of K in KNN
12/02/08CS590M: Statistical Machine Learning - Fall 2008 22
![Page 23: Relational Learning with Gaussian Processes](https://reader035.vdocuments.site/reader035/viewer/2022062322/5681479d550346895db4d3c7/html5/thumbnails/23.jpg)
12/02/08CS590M: Statistical Machine Learning - Fall 2008 23
Thanks
Questions ?