collective classification in large scale networks...collective classification in large scale...
TRANSCRIPT
![Page 1: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/1.jpg)
Collective classification in large scale networksJennifer Neville Departments of Computer Science and Statistics Purdue University
![Page 2: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/2.jpg)
Processed data
Target dataData
Selection Preprocessing
MiningPatternsInterpretationevaluation
Knowledge
The data mining process
Network dataNetwork data is:
heterogeneous and interdependent, partially observed/labeled,
dynamic and/or non-stationary, and often drawn from a single network
...thus many traditional methods developed
for i.i.d. data do not apply
![Page 3: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/3.jpg)
Email networks!
World wide web!Gene/protein networks!
Scientific networks! Social networks!
Organizational networks!
Example predictive models for network domains
Predict protein function from
interaction patterns
Predict group effectiveness from
communication patterns
Predict content changes from properties of
hyperlinked pages
Predict organizational roles from communication
patterns
Predict paper topics from
properties of cited papers
Predict personal preferences from characteristics of
friends
![Page 4: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/4.jpg)
Attribute prediction in networks
< X, Yi = 1 >
< X, Yj = 0 >Observed network autocorrelation
![Page 5: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/5.jpg)
Network autocorrelation is ubiquitous
• Marketing • Product/service adoption among
communicating customers (Domingos & Richardson ‘01, Hill et al ‘06)
• Advertising • On-line brand adv. (Provost et al. ‘09)
• Fraud detection • Fraud status of cellular customers who
call common numbers (Fawcett & Provost ‘97, Cortes et al ‘01)
• Fraud status of brokers who work at the same branch (Neville & Jensen ‘05)
• Biology • Functions of proteins located in together
in cells (Neville & Jensen ‘02)• Tuberculosis infection among people in
close contact (Getoor et al ‘01)
• Movies • Box-office receipts of movies made by the
same studio (Jensen & Neville ‘02)• Web
• Topics of hyperlinked web pages (Chakrabarti et al ‘98, Taskar et al ‘02)
• Business • Industry categorization of corporations
that share common boards members (Neville & Jensen ‘00)
• Industry categorization of corporations that co-occur in news stories (Bernstein et al ‘03)
• Citation analysis • Topics of coreferent scientific papers
(Taskar et al ‘01, Neville & Jensen ‘03)
![Page 6: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/6.jpg)
Exploiting network autocorrelation with collective classification
Use label propagation/approximate inference to collectively classify unobserved nodes in
partially labeled network
![Page 7: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/7.jpg)
How do we learn models for collective classification?
![Page 8: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/8.jpg)
In practice we have a single partially-labeled network
How do we learn a model and make predictions?Within-network learning
![Page 9: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/9.jpg)
Main approches to collective classification
Graph regularization
No learning Add links Learn weights
Probabilistic modeling
Learn from labeled only
Add graph features
Semi-super. learning
GRF (Zhu et al.’03); wvRN (Macskassy et
al.’07)
wvRN+ (Macskassy ’07); GhostEdges
(Gallagher et al.’08); SCRN (Wang et
al.’13)
SocialDim (Tang et al.’09);
RelationStr (Xiang et al.’10);
GNetMine (Ji et al.’10); LNP (Shi et
al.’11); LGCW (Dhurandhar et
al.’13)
LBC (Lu et al.’03); PL-EM (Xiang et al.’08); CC-HLR
(McDowell et al.’12); RDA (Pfeiffer et
al.’14)
RMN (Taskar et al.’01); MLN (Richardson et al.’06); RDN
(Neville et al.’06)
![Page 10: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/10.jpg)
Graph regularization perspective
![Page 11: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/11.jpg)
Gaussian Random Fields: Zhu, Ghahramani, Lafferty ICML’03
![Page 12: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/12.jpg)
Same idea applied to relational data to incorporate guilt by association into collective classification(wvRN—Macskassy and Provost MRDM’03)
![Page 13: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/13.jpg)
Weighted-vote Relational Neighbor (wvRN)
• To find the label of a node vi, perform a weighted vote among its neighbors N(vi):
• Collective classification iteratively recomputes probabilities based on current beliefs of neighbors until convergence
P (yi = 1|N (vi)) =1
|N (vi)|X
vj2N (vi)
P (yj = 1|N (vj))
![Page 14: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/14.jpg)
Apply model to make predictions collectively
Small world graph Labeled nodes: 30% Autocorrelation: 50%
![Page 15: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/15.jpg)
Apply model to make predictions collectively
Random graph Labeled nodes: 30% Autocorrelation: 50%
![Page 16: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/16.jpg)
Extensions
• Graph regularization is used in mainstream ML for i.i.d. data — links are added to account for instance similarity
• In the network community, the links are given apriori and represent relations among the instances, so work has focused on:
• Adding extra links/weights to improve performance:
• Macskassy (AAAI’07) incorporates several types of links, weighting each by assortativity
• Gallagher et al. (KDD’08) add “ghost edges” based on random walks• Wang et al. (KDD’13) weight edges based on social context features
![Page 17: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/17.jpg)
Extensions (cont)
• Learning how to weight existing and added links:
• Ji et al. (PKDD’10) learn how to weight meta-paths in heterogeneous networks
• Shi et al. (CIKM’11) learn how to weight different types of latent links • Dhurandhar et al. (JAIR’13) learn how to weight individual links
![Page 18: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/18.jpg)
Probabilistic modeling perspective
![Page 19: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/19.jpg)
Machine learning 101
Data representation
1
Knowledge representation
2
Objectivefunction
3
Searchalgorithm
4
Relational learning
Email networks!
World wide web!Gene/protein networks!
Scientific networks! Social networks!
Organizational networks!
relationaldata
relationalmodels
Branch (Bn)
Region
Area
Firm
Size
On
Watchlist
Disclosure
Year
Type
Broker (Bk)
On
Watchlist
Is Problem
Problem In Past
Has
Business
Layoffs
Bk
Bk
Bk
Bk
Bn
Bn
Bn
Bn
Bn
![Page 20: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/20.jpg)
There has been a great deal of work on templated graphical model representations for relational data
RBNsRMNs
PRMsIHRMs
MLNs GMNs
DAPERRDNs
Since model representation is also graphical we need
to distinguish data networks from model networks
![Page 21: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/21.jpg)
Data network
![Page 22: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/22.jpg)
Gender?Married?Politics?Religion?
Data network
![Page 23: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/23.jpg)
FND!C
FND!C
FYD!C
MYDC
MYCC
FNC!C
MNCC
FYDC
Attributed network
Data representation
1
Note we often have only a
single network for learning
P (Y|{X}n, G)P (Yi|Xi,XR,YR)
Estimate joint distribution: or conditional distribution:
![Page 24: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/24.jpg)
Define structure of graphical model
Politicsi Politicsj
Politics Religion
Politics Married
Politics Genderi i
i i
i i
Relational template
![Page 25: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/25.jpg)
Yi Yj
Yi X i1
Yi X i2
Yi X i3
Model template
Relational template
![Page 26: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/26.jpg)
Yi Yj
Yi X i1
Yi X i2
Yi X i3
+
Model template Data network
![Page 27: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/27.jpg)
Y2
X22
X21 X2
3
Y4
X42
X41 X4
3
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3
Model network(rolled-out graphical model)
Knowledge representation
2
![Page 28: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/28.jpg)
Y2
X22
X21 X2
3
Y4
X42
X41 X4
3
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3
Learn model parameters from fully labeled networkP (yG|xG) =
1Z(�,xG)
�
T�T
�
C�C(T (G))
�T (xC ,yC ; �T )
Objectivefunction
3 Search: eg. convexoptimization
4
![Page 29: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/29.jpg)
+
Yi Yj
Yi X i1
Yi X i2
Yi X i3
Model template Test network
Apply model to make predictions in another networkdrawn from the same distribution
Y2
X22
X21
X23
Y4
X42
X41 X4
3
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3
Model rolled out on test network(assumption: network is drawn from
same distribution as training)
![Page 30: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/30.jpg)
Collective classification uses full joint, rolled out model for inference… but labeled nodes impact the final model structure
Y2
X22
X21
X23
Y4
X42
X41 X4
3
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3
Y2
X22
X21
X23
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3
Y4
Y4Y4
Y4
Labeled node
![Page 31: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/31.jpg)
Y2
X22
X21
X23
Y4
X42
X41 X4
3
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3
Y2
X22
X21
X23
Y4
X42
X41 X4
3
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
Collective classification uses full joint, rolled out model for inference… but labeled nodes impact the final model structure
Labeled node
The structure of “rolled-out” relational graphical models are determined by the structure of the underlying data network,
including location + availability of labels
…this can impact performance of learning and inference methods
via representation,
objective function,
and search algorithm
![Page 32: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/32.jpg)
Networks are much, much larger in practice… and often there’s only one partially labeled network
![Page 33: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/33.jpg)
Approach #1: Ignore unlabeled during learning
• Drop out unlabeled nodes; Training data = labeled part of network
• Model is defined via local conditional; optimize params using pseudolikelihood
ˆ
⇥Y = argmax
⇥Y
X
vi2VL
logPY (yi|YMBL(vi),xi,⇥Y )
Labels Edges
AttributesP (Y|X,E,⇥Y )
ˆ
⇥Y = argmax
⇥Y
{summation over local log conditionals}
![Page 34: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/34.jpg)
Approach #1: Apply learned model to remainder
• Test data = full network (but only evaluate on unlabeled)• Labeled nodes seed the inference
• Use approximate inference to collectively classify (e.g., Variational, Gibbs)
Labels Edges
AttributesP (Y|X,E,⇥Y )
For unlabeled instances, iteratively estimate:PY (yi|YMB(vi),xi,⇥Y )
![Page 35: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/35.jpg)
Approach #2: Add graph features during learning
• Training data = labeled nodes in network; Added features to incorporate unlabeled graph structure• Tang and Liu (KDD’09) find low-rank approximation of graph structure,
then use as features
• Xiang et al. (WWW’10) use unsupervised learning to model relationship strength and weight edges
![Page 36: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/36.jpg)
Approach #3: Semi-supervised learning
• Use entire network to jointly learn parameters and make inferences about class labels of unlabeled nodes
• Lu and Getoor (ICML’03) use relational features and ICA
• McDowell and Aha (ICML’12) combine two classifiers with label regularization
![Page 37: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/37.jpg)
Predict labels with collective classification
Use predicted probabilities during optimization (in local conditionals)
• Relational Expectation Maximization (EM) (Xiang & Neville ’08)
Semi-supervised relational learning
PY (yi|YMB(vi),xi,⇥Y )
Expectation (E) Step Maximization (M) Stepˆ
⇥Y = argmax
⇥Y
X
YU2YU
PY (YU ){summation over local log conditionals}
G
J
IK
M
H
F
E
C
DB
A
L
C D E FA B
![Page 38: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/38.jpg)
How does relational EM perform?
• Works well when network has a moderate amount of labels
• If network is sparsely labeled, it is often better to use a model that is not learned
• Why? In sparsely labeled networks, errors from the collective classification compound during propagation
Relational EMLabel propagation
Both learning and inference require approximation and network structure impacts errors
![Page 39: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/39.jpg)
• Does over propagation during prediction happen in real world, sparsely labeled networks? YES.
Impact of approximations in semi-supervised RL
0 1000 2000 3000 4000 5000Total Positives
0
1000
2000
3000
4000
5000
Tota
lNeg
ativ
es
Trial 1Trial 2Trial 3Trial 4Trial 5
0 1000 2000 3000 4000 5000Total Positives
0
1000
2000
3000
4000
5000
Tota
lNeg
ativ
es
Trial 1Trial 2Trial 3Trial 4Trial 5
Conditional: Relational Naive Bayes
Conditional: Relational Logistic Regression
Class Prior Class Prior
![Page 40: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/40.jpg)
Finding #1: Network structure can bias inference in partially-labeled networks; maximum entropy constraints correct for bias
![Page 41: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/41.jpg)
• We compared CL-EM and PL-EM and examined the distribution of predicted probabilities on a real world dataset- Amazon Co-occurrence (SNAP)- Varied class priors, 10% Labeled
• Overpropagation error during inference causes PL-EM to collapse to single prediction
• Worse on sparsely labeled datasets
Effect of relational biases on R-EM
P(+)
0%
6%
13%
19%
25%
Actual RML CL-EM PL-EM
Need method to correct bias for any
method based on local (relational) conditional
Err
![Page 42: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/42.jpg)
0.5
0
1
P(+)
1�1 0(P (y) = 0.5)
• Correction to inference (E-Step)- Enables estimation with the
pseudolikelihood (M-Step)• Idea: The proportion of
negatively predicted items should equal the proportion of negatively labeled items- Fix: Shift the probabilities
up/down• Repeat for each inference itr
Maximum entropy inference for PL-EM (Pfeiffer et al. WWW’15)
transform probabilities to logit space:
compute offset location:
adjust logits:
transform back to probabilities:
hi = ��1(P (yi = 1))
� = P (0) · |VU |
hi = hi � h(�)
P (yi) = �(hi)
Pivot = 5/7
Corrected probabilities are used to retrain during PL-EM (M-Step)
![Page 43: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/43.jpg)
Experimental results - Correction effects
Amazon(small prior)
Amazon(large prior)
P(+)
0%
25%
50%
75%
100%
Actu
al
RML
CL-
EM
PL-E
M (N
aive
)
PL-E
M(M
axEn
tInf)
P(+)
0%
25%
50%
75%
100%
Actu
al
RML
CL-
EM
PL-E
M (N
aive
)
PL-E
M (M
axEn
tInf)
Max entropy correction removes bias due to over propagation in collective inference
![Page 44: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/44.jpg)
LR LR (EM) RLR LP CL-EM PL-EM (Naive) PL-EM (MaxEntInf)
10�3 10�2 10�1
Proportion Labeled
0.25
0.30
0.35
0.40
0.45
0.50
BA
E
10�3 10�2 10�1
Proportion Labeled
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
BA
E
Computers Organic
Experimental results - Large patent dataset
PL-EM (Naive) PL-EM (Naive)
PL-EM (MaxEnt)
Correction allows relational EM to improve over competing methods in sparsely labeled domains
PL-EM (MaxEnt)
Note: McDowell & Aha (ICML’12) may correct same effect, but during estimation rather than inference
![Page 45: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/45.jpg)
Finding #2: Network structure can bias learning in partially-labeled networks; modeling uncertainty corrects for bias
![Page 46: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/46.jpg)
• Over correction also happens during parameter estimation in semi-in real world, sparsely labeled networks.
Impact of approximations in semi-supervised RL
Relational Naive Bayes Relational Logistic Regression
�0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2P (�|y)
�0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
P(+
|y)
P (·|�)
P (·|+)
�0.2 0.0 0.2 0.4 0.6 0.8 1.0w[0]
�0.6�0.5�0.4�0.3�0.2�0.1
0.00.10.2
w[1
]
w
![Page 47: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/47.jpg)
Collective Inference Approximation• Relational EM:
Impact of approximations in semi-supervised RL
M-Step:
E-Step: (For all unlabeled instances)
Pseudolikelihood Approximation
ˆ
⇥Y = argmax
⇥Y
X
YU2YU
PY (YU )
X
vi2VL
logPY (yi| ˜YMB(vi),xi,⇥Y )
PY (yi|YMB(vi),xi,⇥Y )
• Over-propagation error- Both classes
neighbor yellow… thus yellow neighbors are not predictive
• Produces over-correction inestimation
Accounting for uncertainty in parameter estimates will correct for this bias
![Page 48: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/48.jpg)
• Data augmentation is a Bayesian version of EM- Parameters are RVs- Compute posterior predictive
distribution (Tanner & Wong ’87)
• We developed a relational version of data augmentation• Final inference is over a
distribution of parameter values• Requires prior distributions
over parameters and sampling methods
Relational data augmentation (Pfeiffer et al. ICDM’14)
Parameters
Predictions
Fixed Point Stochastic
Fixed Point Relational EM —
Stochastic Relational Stochastic EM
Relational Data Augmentation
Alternate Between:
Gibbs sample of labels
Sample Parameters
Final Parameters:
Final Inference:
YtU ⇠ P t
Y (YU |YL,X,E, ⇥t�1Y )
⇥Y =1
T
X
t
⇥tY
YtU =
1
T
X
t
YtU
⇥tY ⇠ P t(⇥Y |Yt
U ,YL,X,E)
![Page 49: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/49.jpg)
Experimental results - Amazon DVD
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled
0.340.360.380.400.420.440.460.480.50
0-1
Loss
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled
0.300.320.340.360.380.400.420.440.460.48
0-1
Loss
Naive Bayes Logistic Regression
Relational EM Relational EM
Relational DA Relational DA
Relational EM’s instability in sparsely labeled domains causes poor performance
![Page 50: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/50.jpg)
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled
0.40
0.45
0.50
0.55
0.60
MA
E
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9Percentage of Graph Labeled
0.300.320.340.360.380.400.420.440.46
MA
EExperimental results - Facebook
Naive Bayes Logistic Regression
Relational EM Relational EM
Relational DA
Relational SEM
Relational data augmentation can outperform relational stochastic EM
Relational DA
![Page 51: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/51.jpg)
Finding #3: Implicit assumption is that nodes of the same type should be identically distributed—but many relational representations cannot ensure this holds for varying graph structures
![Page 52: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/52.jpg)
I.I.D. assumption revisited
• Current relational models do not impose the same marginal invariance condition that is assumed in IID models, which can impair generalization
A B
C D
E
F
p(yE |xE)p(yA|xA)
• Markov relational network representation does not allow us to explicitly specify the form of the marginal probability distributions, thus it is difficult to impose any equality constraints on the marginals
p(yA|xA) 6= p(yE |xE) due to varying graph structure
![Page 53: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/53.jpg)
• Goal: Combine the marginal invariance advantages of IID models with the ability to model relational dependence• Incorporate node attributes in a general way (similar to IID classifiers)
• Idea: Apply copulas to combine marginal models with dependence structure
Is there an alternative approach?
t1 t2 t3 tn...�
z1 z2 z3 zn... zi marginally ~ FiF
t jointly ~
zi = F (�1)i (�i(ti))
�
Copula theory: can construct n-dimensional vector of arbitrary marginals while preserving the desired dependence structure
![Page 54: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/54.jpg)
Let’s start with a reformulation of IID classifiers...
• General form of probabilistic binary classification:• e.g., Logistic regression
Zj?
• In IID models, the random effect for each instance is independent, thus can be integrated out
• When links among instances are observed, the correlations among their class labels can be modeled through dependence among the z’s
• Key question: How to model the dependence among z’s while preserving the marginals?
p(yi = 1) = F (⌘(xi))
• Now view F as the CDF of a distribution symmetric around 0 to obtain a latent variable formulation:
• z is a continuous variable, capturing random effects that are not present in x• p is the corresponding PDF of F
zi ⇠ p(zi = z|xi = x) = f(z � ⌘(xi))yi = sign(zi)
![Page 55: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/55.jpg)
Copula Latent Markov Network (CLMN)
The CLMN model • Sample t from the desired joint
dependency:
• Apply marginal transformation to obtain the latent variable z:
• Classification:
CLMN
IID classifiers
(t1, t2, . . . , tn) ⇠ �
zi = F (�1)i (�i(ti))
yi = sign(zi)
![Page 56: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/56.jpg)
Copula Latent Markov Network (Xiang and N. WSDM‘13)
CLMN implementation
Gaussian Markov network
Logistic regressionInference:
• Conditional inference in copulas have not previously been considered for large-scale networks
• For efficient inference, we developed a message passing algorithm based on EP
Estimation:
• First, learn marginal model as if instances were IID
• Next, learn the dependence model conditioned on the marginal model... but GMN has no parameters to learn
![Page 57: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/57.jpg)
Experimental Results
IMDB
Gene IMDB
CLMN SocDim RMNLR GMNKey idea: Ensuring that nodes with varying graph
structure have identical marginals improves learning
![Page 58: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/58.jpg)
Conclusion
• Relational models have been shown to significantly improve network predictions through the use of joint modeling and collective inference
• But we need to understand the impact of real-world graph characteristics on model/algorithms in order to better exploit network information for learning and prediction
• A careful consideration of interactions between:
data representation, knowledge representation, objective functions, and search algorithms
will improve our understanding of the choices/mechanisms that affect performance and drive the development of new algorithms
![Page 59: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/59.jpg)
What can network science offer to RML?
• Consider effects of graph structure on learning/inference
• ML methods learn a model conditioned on graph
• When is a new graph “drawn from the same distribution”? How to model distributions of networks? (see Moreno et al. KDD’13)
• If two networks are drawn from different distributions, how can we transfer a model learned on one network to apply it in another? (see Niu et al. ICDMW’15)
P (Y|{X}n, G)
Y2
X22
X21 X2
3
Y4
X42
X41 X4
3
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3Y2
X22
X21
X23
Y4
X42
X41 X4
3
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3
![Page 60: Collective classification in large scale networks...Collective classification in large scale networks Jennifer Neville Departments of Computer Science and Statistics Purdue University](https://reader033.vdocuments.site/reader033/viewer/2022060513/5f2ba7d03cedb741c218184d/html5/thumbnails/60.jpg)
What can network science offer to ML?
• How does network structure impact performance of ML methods?
• Need to study the properties of the rolled out model network, comparing learning setting with inference setting
• ML researchers study the effects of generic graph properties on graphical model inference. Should consider network characteristics like connected components, density, clustering, …
• We are starting to study this with synthetically generated attributed networks (see Pfeiffer et al. WWW’14)
Y2
X22
X21
X23
Y5
X52
X51 X5
3
Y8
X82
X81 X8
3
Y7
X72
X71 X7
3
Y6
X62
X61 X6
3
Y3
X32
X31 X3
3
Y1
X12
X11 X1
3
Y4
Y4Y4
Y4