exploit of online social networks with community-based graph semi-supervised learning mingzhen mo...

19
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong ICONIP 2010, Sydney, Australia

Upload: milton-collins

Post on 05-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Exploit of Online Social Networks withCommunity-Based Graph Semi-Supervised Learning

Mingzhen Mo and Irwin King

Department of Computer Science and EngineeringThe Chinese University of Hong Kong

ICONIP 2010, Sydney, Australia

Page 2: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Motivation

• Online social network is an important way to interact with friends

• A large number users are attracted by it– 500 million active users (Facebook)– 700 billion minutes (Facebook)

• The security of users’ information attracts much attention from researchers and developers

ICONIP 2010, Sydney, Australia 2

Page 3: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Problem

3ICONIP 2010, Sydney, Australia

U3U3

U1U1

U2U2 U4U4

U5U5

Group

Network

Page 4: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Example

• On Facebook• Given: – Users’ profiles, e.g., age, location and phone– Friendship relationship– Member lists of groups and networks

• Output– Predict the university information

ICONIP 2010, Sydney, Australia 4

Page 5: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Objective

• Build a model with proper algorithm to predict the hidden information

• Better utilize community information• Related works– Graph Theory [G. Flake et al., SIGKDD 2000]– Supervised Learning [E. Zheleva et al., WWW2009]– Semi-Supervised Learning [M. Mo et al., IJCNN2010]

5ICONIP 2010, Sydney, Australia

Page 6: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Contributions

• Propose a novel community-based model– Predict hidden information more accurately

• Provide two algorithms– Be able to deal with different conditions

• Help to understand the security level in social networks.

6ICONIP 2010, Sydney, Australia

Page 7: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Preparation for Modeling

• Definition– Online social network: G(V, E)• Profile Pi

• Friendship Wij

– Two sets• Labeled data Vl

• Unlabeled data Vu

P3P3

P1P1

P2P2 P4P4

P5P5

P3P3

P1P1

P2P2 P4P4

P5P5

Y5Y1

2Y 4Y

3Y

W1,3

W3,4

W3,5

W4,5

W2,4

W1,2

},,{ 51 vvVl },,,{ 432 vvvVu

7ICONIP 2010, Sydney, Australia

Page 8: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Consistency on Graph

ICONIP 2010, Sydney, Australia 8

Community Consistency

Community-Based Graph (CG) SSL

Model 3

Local Consistency

Global Consistency

Basic Graph-Based SSL with Harmonic

Function

Local and Global Consistency (LGC)

Graph SSL

Model 1Model 2

U3U3

U1U1

U2U2 U4U4

U5U5

Y2

Y1

Local Consistency label Y1 should be similar to label Y2

Global Consistency Predicted label should be closed to the true label Y2

2Y

Network

Community Consistency Predicted label should be closed to the true label , if user 2 and user 4 are in the same network.

2Y

4Y

2Y4Y

Page 9: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Community-based Graph (CG) Model

• Input: basic graph , community graph• Output: predicted labels• Objective

is the Laplacian Matrix of community info , and

Local & Global Consistency (LGC) Learning Community Term

cWccc WDL

ul

j

ccii jiWD

1

),(

9ICONIP 2010, Sydney, Australia

gW cW

Y

True Labels

Parameter 1

Parameter 2

Page 10: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Community-based Graph (CG) Model

• Generating– Clustering vertices• “Distance” is measured by Group and Network info.

– Mark down each cluster in a matrix • E.g., a cluster contains the vertex 1, 2 and 3

– _ , nc is the total number of clusters

cW

ciW

10ICONIP 2010, Sydney, Australia

0000

00

00

00

3,23,1

3,22,1

3,12,1

dd

dd

dd

W ci

Page 11: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Algorithms

• Algorithm one– Closed form algorithm– Simple and time-saving

gW cW

CS

Y

Input

Outp

utP

rocess

11ICONIP 2010, Sydney, Australia

Page 12: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Algorithms

• Algorithm two– Iterative algorithm– Able to deal with large-scale data

gW cW

CS

Y

Input

Outp

utP

rocess

)()1( tFtF

)(iF

True

False

12ICONIP 2010, Sydney, Australia

)0(F

Page 13: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Experiments• Datasets– One synthetic dataset: TwoMoons– Two real-world datasets: StudiVZ & Facebook

• Objectives– Classification in TwoMoons– Predict university names in StudiVZ & Facebook

• Comparison– Supervised learning– Basic and LGC graph learning

• Evaluation– Accuracy and confidence

13ICONIP 2010, Sydney, Australia

Page 14: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Datasets

• Statistic

• Visualization– TwoMoons

14ICONIP 2010, Sydney, Australia

Page 15: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Experimental Results

• TwoMoons (200 vertices)

15ICONIP 2010, Sydney, Australia

• The community information does help in prediction in term of accuracy• The CG SSL is stably better than the others

Observations

Page 16: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Experimental Results

• StudiVZ (1,423 users)

16ICONIP 2010, Sydney, Australia

• All graph-based SSL outperforms supervised learning• The CG SSL keeps stably better than the others

Observations

Page 17: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Experimental Results

• Facbook (10,410 users)

17ICONIP 2010, Sydney, Australia

• In most cases, CG SSL outperforms other learning methods• There is little instability in CG SSL model

Observations

Page 18: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

Conclusion

• Community-based Graph SSL model describes the real world more exactly

• CG SSL predicts the hidden information of online social networks with higher accuracy and confidence

• The security of users’ information becomes in lower level

19ICONIP 2010, Sydney, Australia

Page 19: Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering

THANK YOUQ & A

20ICONIP 2010, Sydney, Australia