learning to discover social circles in ego networks · learning to discover social circles in ego...

1
Learning to Discover Social Circles in Ego Networks Julian McAuley, Jure Leskovec Stanford University Abstract Our personal social networks are big and cluttered, and cur- rently there is no good way to organize them. Social network- ing sites allow users to manually categorize their friends into social circles (e.g. ‘circles’ on Google+, and ‘lists’ on Face- book and Twitter), however they are laborious to construct and must be updated whenever a user’s network grows. We define a novel machine learning task of identifying users’ social cir- cles. We pose the problem as a node clustering problem on a user’s ego-network, a network of connections between her friends. “Knows your circles better than you do!” – Wired Properties of circles Our goal is to automatically detect circles using profile and network information. We develop a model of circles with the following properties: Circles form around nodes with common properties. Different circles are formed by different properties, e.g. one circle might be formed by family members, and another by students who attended the same university. Circles can overlap, and ‘stronger’ circles form within ‘weaker’ ones, e.g. a circle of friends from the same degree program may form within a circle from the same university. We leverage both profile information and network structure in order to identify circles. Model Our model predicts hard memberships to multiple, overlap- ping circles, using both profile and network information. p((x, y) E) exp ( X C k ⊇{x,y} hφ(x, y)k i | {z } circles containing both nodes - X C k +{x,y} α k hφ(x, y)k i | {z } all other circles ) Training is done by maximum likelihood, using QPBO and L- BFGS. An ego-network Some detected circles Facebook: Google+: Blue = true positive; gray = true negative; red = false positive; yellow = false negative; green = detected circles for which we have no groundtruth. Examples of model parameters for four circles feature index for φ 1 i 1 weight θ 1 ,i people with PhDs living in S.F. or Stanford feature index for φ 1 i 1 weight θ 2 ,i Germans who went to school in 1997 feature index for φ 1 i 1 weight θ 3 ,i Americans feature index for φ 1 i 1 weight θ 4 ,i college educated people working at a particular institute Data We collect data from Facebook, Google+, and Twitter ego-networks circles nodes edges Facebook 10 193 4,039 88,234 Google+ 133 479 107,614 13,673,453 Twitter 1,000 4,869 81,306 1,768,149 All data are available on snap.stanford.edu/data/ “Results are decent” – MIT Technology Review Some results , eq. 14) , eq. 14) , eq. 13) , eq. 12) Facebook Google+ Twitter 0 . 5 1 . 0 Accuracy (1 - BER) .77 .72 .70 .84 .72 .70 Accuracy on detected communities (1 - Balanced Error Rate, higher is better) multi-assignment clustering (Streich, Frank, et al.) low-rank embedding (Yoshida) block-LDA (Balasubramanyan and Cohen) our model (friend-to-friend features φ 1 our model (friend-to-user features φ 2 our model (compressed features ψ 1 our model (compressed features ψ 2 , eq. 14) , eq. 14) , eq. 13) Facebook Google+ Twitter 0 . 0 1 . 0 Accuracy (F 1 score) .40 .38 .34 .59 .38 .34 Accuracy on detected communities (F 1 score, higher is better) , eq. 12) multi-assignment clustering (Streich, Frank, et al.) low-rank embedding (Yoshida) block-LDA (Balasubramanyan and Cohen) φ 1 our model (friend-to-user features φ 2 our model (compressed features ψ 1 our model (compressed features ψ 2 our model (friend-to-friend features Bibliography M C AULEY , J. , AND L ESKOVEC, J. 2012. Discovering Social Circles in Ego Networks. In arXiv:1210.8182 S TREICH , A. , F RANK , M. , B ASIN , D. , AND B UHMANN , J. 2009. Multi-assignment clustering for boolean data. In International Conference on Machine Learning. YOSHIDA , T. 2010. Toward finding hidden communities based on user profiles. In ICDM Workshops. B ALASUBRAMANYAN , R. AND C OHEN , W. 2011. Block-LDA: Jointly modeling entity-annotated text and entity-entity links. In SIAM In- ternational Conference on Data Mining. R OTHER , C. , KOLMOGOROV , V. , L EMPITSKY , V. , AND S ZUM - MER , M. 2007. Optimizing binary MRFs via extended roof duality. In Computer Vision and Pattern Recognition. [email protected], [email protected]

Upload: dinhnhu

Post on 11-Aug-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning to Discover Social Circles in Ego Networks · Learning to Discover Social Circles in Ego Networks Julian McAuley, Jure Leskovec Stanford University Abstract Our personal

Learning to Discover Social Circles in Ego NetworksJulian McAuley, Jure Leskovec

Stanford University

Abstract

Our personal social networks are big and cluttered, and cur-rently there is no good way to organize them. Social network-ing sites allow users to manually categorize their friends intosocial circles (e.g. ‘circles’ on Google+, and ‘lists’ on Face-book and Twitter), however they are laborious to construct andmust be updated whenever a user’s network grows. We definea novel machine learning task of identifying users’ social cir-cles. We pose the problem as a node clustering problem ona user’s ego-network, a network of connections between herfriends.

“Knows your circles better than you do!”– Wired

Properties of circles

Our goal is to automatically detect circles using profile andnetwork information. We develop a model of circles with thefollowing properties:

Circles form around nodes with common properties.Different circles are formed by different properties, e.g. onecircle might be formed by family members, and another bystudents who attended the same university.Circles can overlap, and ‘stronger’ circles form within‘weaker’ ones, e.g. a circle of friends from the same degreeprogram may form within a circle from the same university.We leverage both profile information and network structurein order to identify circles.

Model

Our model predicts hard memberships to multiple, overlap-ping circles, using both profile and network information.

p((x, y) ∈ E) ∝ exp

{ ∑Ck⊇{x,y}

〈φ(x, y), θk〉︸ ︷︷ ︸circles containing both nodes

−∑

Ck+{x,y}

αk 〈φ(x, y), θk〉︸ ︷︷ ︸all other circles

}

Training is done by maximum likelihood, using QPBO and L-BFGS.

An ego-network

Some detected circles

Facebook:

Google+:

Blue = true positive; gray = true negative; red = false positive; yellow = falsenegative; green = detected circles for which we have no groundtruth.

Examples of model parameters for four circles

feature index for φ1i

1

wei

ghtθ

1,i

people with PhDsliving in S.F. or Stanford

feature index for φ1i

1

wei

ghtθ

2,i

Germanswho went to school in 1997

feature index for φ1i

1

wei

ghtθ

3,i

Americans

feature index for φ1i

1w

eigh

tθ4,i

college educated people

working at a particular institute

Data

We collect data from Facebook, Google+, and Twitter

ego-networks circles nodes edgesFacebook 10 193 4,039 88,234Google+ 133 479 107,614 13,673,453Twitter 1,000 4,869 81,306 1,768,149

All data are available on snap.stanford.edu/data/

“Results are decent”– MIT Technology ReviewSome results

, eq. 14)

, eq. 14), eq. 13), eq. 12)

Facebook Google+ Twitter0.5

1.0

Accuracy(1-BER)

.77.72 .70

.84

.72 .70

Accuracy on detected communities (1 - Balanced Error Rate, higher is better)multi-assignment clustering (Streich, Frank, et al.)low-rank embedding (Yoshida)block-LDA (Balasubramanyan and Cohen)our model (friend-to-friend features φ1

our model (friend-to-user features φ2

our model (compressed features ψ1

our model (compressed features ψ2

, eq. 14), eq. 14), eq. 13)

Facebook Google+ Twitter0.0

1.0

Accuracy(F

1score)

.40 .38 .34

.59

.38 .34

Accuracy on detected communities (F1 score, higher is better)

, eq. 12)

multi-assignment clustering (Streich, Frank, et al.)low-rank embedding (Yoshida)block-LDA (Balasubramanyan and Cohen)

φ1

our model (friend-to-user features φ2

our model (compressed features ψ1

our model (compressed features ψ2

our model (friend-to-friend features

Bibliography

MCAULEY, J., AND LESKOVEC, J. 2012. Discovering Social Circlesin Ego Networks. In arXiv:1210.8182

STREICH, A., FRANK, M., BASIN, D., AND BUHMANN, J. 2009.Multi-assignment clustering for boolean data. In International Conferenceon Machine Learning.

YOSHIDA, T. 2010. Toward finding hidden communities based on userprofiles. In ICDM Workshops.

BALASUBRAMANYAN, R. AND COHEN, W. 2011. Block-LDA:Jointly modeling entity-annotated text and entity-entity links. In SIAM In-ternational Conference on Data Mining.

ROTHER, C., KOLMOGOROV, V., LEMPITSKY, V., AND SZUM-MER, M. 2007. Optimizing binary MRFs via extended roof duality. InComputer Vision and Pattern Recognition.

[email protected], [email protected]