exploit of online social networks with community-based graph semi-supervised learning mingzhen mo...
TRANSCRIPT
Exploit of Online Social Networks withCommunity-Based Graph Semi-Supervised Learning
Mingzhen Mo and Irwin King
Department of Computer Science and EngineeringThe Chinese University of Hong Kong
ICONIP 2010, Sydney, Australia
Motivation
• Online social network is an important way to interact with friends
• A large number users are attracted by it– 500 million active users (Facebook)– 700 billion minutes (Facebook)
• The security of users’ information attracts much attention from researchers and developers
ICONIP 2010, Sydney, Australia 2
Problem
3ICONIP 2010, Sydney, Australia
U3U3
U1U1
U2U2 U4U4
U5U5
Group
Network
Example
• On Facebook• Given: – Users’ profiles, e.g., age, location and phone– Friendship relationship– Member lists of groups and networks
• Output– Predict the university information
ICONIP 2010, Sydney, Australia 4
Objective
• Build a model with proper algorithm to predict the hidden information
• Better utilize community information• Related works– Graph Theory [G. Flake et al., SIGKDD 2000]– Supervised Learning [E. Zheleva et al., WWW2009]– Semi-Supervised Learning [M. Mo et al., IJCNN2010]
5ICONIP 2010, Sydney, Australia
Contributions
• Propose a novel community-based model– Predict hidden information more accurately
• Provide two algorithms– Be able to deal with different conditions
• Help to understand the security level in social networks.
6ICONIP 2010, Sydney, Australia
Preparation for Modeling
• Definition– Online social network: G(V, E)• Profile Pi
• Friendship Wij
– Two sets• Labeled data Vl
• Unlabeled data Vu
P3P3
P1P1
P2P2 P4P4
P5P5
P3P3
P1P1
P2P2 P4P4
P5P5
Y5Y1
2Y 4Y
3Y
W1,3
W3,4
W3,5
W4,5
W2,4
W1,2
},,{ 51 vvVl },,,{ 432 vvvVu
7ICONIP 2010, Sydney, Australia
Consistency on Graph
ICONIP 2010, Sydney, Australia 8
Community Consistency
Community-Based Graph (CG) SSL
Model 3
Local Consistency
Global Consistency
Basic Graph-Based SSL with Harmonic
Function
Local and Global Consistency (LGC)
Graph SSL
Model 1Model 2
U3U3
U1U1
U2U2 U4U4
U5U5
Y2
Y1
Local Consistency label Y1 should be similar to label Y2
Global Consistency Predicted label should be closed to the true label Y2
2Y
Network
Community Consistency Predicted label should be closed to the true label , if user 2 and user 4 are in the same network.
2Y
4Y
2Y4Y
Community-based Graph (CG) Model
• Input: basic graph , community graph• Output: predicted labels• Objective
is the Laplacian Matrix of community info , and
Local & Global Consistency (LGC) Learning Community Term
cWccc WDL
ul
j
ccii jiWD
1
),(
9ICONIP 2010, Sydney, Australia
gW cW
Y
True Labels
Parameter 1
Parameter 2
Community-based Graph (CG) Model
• Generating– Clustering vertices• “Distance” is measured by Group and Network info.
– Mark down each cluster in a matrix • E.g., a cluster contains the vertex 1, 2 and 3
– _ , nc is the total number of clusters
cW
ciW
10ICONIP 2010, Sydney, Australia
0000
00
00
00
3,23,1
3,22,1
3,12,1
dd
dd
dd
W ci
Algorithms
• Algorithm one– Closed form algorithm– Simple and time-saving
gW cW
CS
Y
Input
Outp
utP
rocess
11ICONIP 2010, Sydney, Australia
Algorithms
• Algorithm two– Iterative algorithm– Able to deal with large-scale data
gW cW
CS
Y
Input
Outp
utP
rocess
)()1( tFtF
)(iF
True
False
12ICONIP 2010, Sydney, Australia
)0(F
Experiments• Datasets– One synthetic dataset: TwoMoons– Two real-world datasets: StudiVZ & Facebook
• Objectives– Classification in TwoMoons– Predict university names in StudiVZ & Facebook
• Comparison– Supervised learning– Basic and LGC graph learning
• Evaluation– Accuracy and confidence
13ICONIP 2010, Sydney, Australia
Datasets
• Statistic
• Visualization– TwoMoons
14ICONIP 2010, Sydney, Australia
Experimental Results
• TwoMoons (200 vertices)
15ICONIP 2010, Sydney, Australia
• The community information does help in prediction in term of accuracy• The CG SSL is stably better than the others
Observations
Experimental Results
• StudiVZ (1,423 users)
16ICONIP 2010, Sydney, Australia
• All graph-based SSL outperforms supervised learning• The CG SSL keeps stably better than the others
Observations
Experimental Results
• Facbook (10,410 users)
17ICONIP 2010, Sydney, Australia
• In most cases, CG SSL outperforms other learning methods• There is little instability in CG SSL model
Observations
Conclusion
• Community-based Graph SSL model describes the real world more exactly
• CG SSL predicts the hidden information of online social networks with higher accuracy and confidence
• The security of users’ information becomes in lower level
19ICONIP 2010, Sydney, Australia
THANK YOUQ & A
20ICONIP 2010, Sydney, Australia