university of california at santa barbara
DESCRIPTION
User Interactions in Social Networks and their Implications. University of California at Santa Barbara Christo Wilson , Bryce Boe , Alessandra Sala , Krishna P. N. Puttaswamy , and Ben Zhao. Social Networks. Social Applications. - PowerPoint PPT PresentationTRANSCRIPT
University of California at Santa Barbara
Christo Wilson, Bryce Boe, Alessandra Sala, Krishna P. N. Puttaswamy, and Ben Zhao
User Interactions in Social Networks and their Implications
University of California at Santa Barbara 2
Social Networks
4/2/2009
University of California at Santa Barbara 3
Social Applications
4/2/2009
Enables new ways to solve problems for distributed systems Social web search Social bookmarking Social marketplaces Collaborative spam filtering (RE: Reliable
Email) How popular are social applications?
Facebook Platform – 50,000 applications Popular ones have >10 million users each
University of California at Santa Barbara 44/2/2009
Social Graphs and User Interactions Social applications rely on
1. Social graph topology2. User interactions
Currently, social applications evaluated just using social graph Assume all social links are equally
important/interactive Is this true in reality?
Milgram’s familiar stranger Connections for ‘status’ rather than ‘friendship’
Incorrect assumptions lead to faulty application design and evaluation
University of California at Santa Barbara 5
Goals
4/2/2009
Question: Are social links valid indicators of real user interaction? First large scale study of Facebook
10 million users (15% of total users) / 24 million interactions
Use data to show highly skewed distribution of interactions <1% of people on Facebook talk to >50% of their friends
Propose new model for social graphs that includes interaction information Interaction Graph Reevaluate existing social application using new model
In some cases, break entirely
6University of California at Santa Barbara
• Characterizing Facebook• Analyzing User Interactions• Interaction Graphs• Effects on Social Applications
Outline
4/2/2009
University of California at Santa Barbara 7
Crawling Facebook for Data
4/2/2009
Facebook is the most popular social network Crawling social networks is difficult
Too large to crawl completely, must be sampled Privacy settings may prevent crawling
Thankfully, Facebook is divided into ‘networks’ Represent geographic regions, schools,
companies Regional networks are not authenticated
University of California at Santa Barbara 8
Crawling for Data, cont.
Crawled Facebook regional networks 22 largest networks: London, Australia, New York, etc Timeframe: March – May 2008 Start with 50 random ‘seed’ users, perform BFS search
Data recorded for each user: Friends list History of wall posts and photo comments
Collectively referred to as interactions Most popular publicly accessible Facebook
applications
4/2/2009
University of California at Santa Barbara 9
Facebook Orkut1
Number of Users Crawled 10,697,000 1,846,000
Percentage of Total Users 15% 26.9%
Number of Social Links Crawled
408,265,000 22,613,000
Radius 9.8 6
Diameter 13.4 9
Average Path Length 4.8 4.25
Clustering Coefficient 0.164 0.171
Power-law Coefficient α=1.5, D=0.55
α=1.5, D=0.6
High Level Graph Statistics
4/2/2009
1. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proc. of IMC, October 2007.
•Based on Facebook’s total size of 66 million users in early 2008
•Represents ~50% of all users in the crawled regions
•~49% of links were crawlable
•This provides a lower bound on the average number of in-network friends
•Avg. social degree = ~77
•Low average path length and high clustering coefficient indicate Facebook is small-world
10University of California at Santa Barbara
• Characterizing Facebook• Analyzing User Interactions• Interaction Graphs• Effects on Social Applications
Outline
4/2/2009
University of California at Santa Barbara 11
Analyzing User Interactions
Having established that Facebook has the expected social graph properties…
Question: Are social links valid indicators of real user interaction?
Examine distribution of interactions among friends
4/2/2009
University of California at Santa Barbara 12
Distribution Among Friends
0 5 10 15 20 25 30 35 40 45 500
10
20
30
40
50
60
70
80
90
100
70% Interaction Cumulative Fraction90% Interaction Cumulative Fraction
% of Friends Involved
% o
f U
sers
(C
DF)
4/2/2009
For 50% of users, 70% of interaction comes from 7% of
friends.
Almost nobody interacts with more than 50% of their friends!
For 50% of users, 100% of interaction comes from 20% of
friends.
•Social degree does not accurately predict human behavior
•Initial Question: Are social links valid indicators of real user interaction?
Answer: NO
13University of California at Santa Barbara
• Characterizing Facebook• Analyzing User Interactions• Interaction Graphs• Effects on Social Applications
Outline
4/2/2009
University of California at Santa Barbara 14
A Better Model of Social Graphs
4/2/2009
Answer to our initial question: Not all social links are created equal Implication: can not be used to evaluate
social applications What is the right way to model social
networks? More accurately approximate reality by
taking user interactivity into account Interaction Graphs
Chun et. al. IMC 2008
University of California at Santa Barbara 15
Interaction Graphs
Definition: a social graph parameterized by… n : minimum number of interactions per
edge t : some window of time for interactions
n = 1 and t = {2004 to the present}
4/2/2009
University of California at Santa Barbara 16
0 200 400 600 800 1000 1200 14000
50
100
150
200
250
300
350
400
450
500
Social Degree
Inte
racti
on
Deg
ree
Social vs. Interaction Degree
4/2/2009
1:1 Degree Ratio
Dunbar’s Number (150)
99% of Facebook Users
•Interaction graph prunes useless edges
•Results agree with theoretical limits on human social cognition
University of California at Santa Barbara 17
Interaction Graph Analysis
4/2/2009
Do Interaction Graphs maintain expected social network graph properties?
Social Graph Interaction Graph
Number of Vertices 10,697,000 8,403,000
Number of Edges 408,265,000 94,665,000
Radius 9.8 12.4
Diameter 13.4 19.8
Average Path Length 4.8 7.3
Clustering Coefficient 0.164 0.078
Power-law Coefficient α=1.5, D=0.55 α=1.5, D=0.24
•Interaction Graphs still have
Power-law scaling
Scale-free behavior
Small-world clustering
•… But, exhibit less of these characteristics than the full social network
18University of California at Santa Barbara
• Characterizing Facebook• Analyzing User Interactions• Interaction Graphs• Effects on Social
Applications
Outline
4/2/2009
University of California at Santa Barbara 19
Social Applications, Revisited
4/2/2009
Recap: Need a better model to evaluate social
applications Interaction Graphs augment social graphs
with interaction information How do these changes effect social
applications? Sybilguard Analysis of Reliable Email in the paper
University of California at Santa Barbara 20
Sybilguard
4/2/2009
Sybilguard is a system for detecting Sybil nodes in social graphs
Why do we care about detecting Sybils? Social network based games:
Social marketplaces:
How Sybilguard works Key insight: few edges between Sybils and
legitimate users (attack edges) Use persistent routing tables and random walks
to detect attack edges
21
Sybilguard Algorithm
4/2/2009University of California at Santa Barbara
Step 1:
Bootstrap the network.
All users exchange signed keys.
Key exchange implies that both parties are human and trustworthy.
Step 2:
Choose a verifier (A) and a suspect (B).
A and B send out random walks of a certain length (2).
Look for intersections.
A knows B is not a Sybil because multiple paths intersect and they do so at different nodes. A
B
University of California at Santa Barbara 22
Sybilguard Algorithm, cont.
4/2/2009
A
B
University of California at Santa Barbara 23
Sybilguard Caveats
4/2/2009
Bootstrapping requires human interaction Evaluating Sybilguard on the social graph is
overly optimistic because most friends never interact!
Better to evaluate using Interaction Graphs
University of California at Santa Barbara 24
Expected Impact
4/2/2009
Fewer of edges, lower clustering lead to reduced performance
Why? Self-loops
A
B
University of California at Santa Barbara 25
Sybilguard on Interaction Graphs
4/2/2009
0 200 400 600 800 1000 1200 14000
10
20
30
40
50
60
70
80
90
100
Social Graph
Full Interaction Graph
Interaction Graph (1 Year)
Interaction Graph (6 Months)
Random Walk Path Length
% o
f In
ters
ecti
on
s
(CD
F)
•When evaluated under real world conditions, performance of social applications changes dramatically
University of California at Santa Barbara 26
Conclusion
4/2/2009
First large scale analysis of Facebook Answer the question: Are social links
valid indicators of real user interaction? Formulate new model of social networks:
Interaction Graphs Demonstrate the effect of Interaction
Graphs on social applications Final takeaway: when building social
applications, use interaction graphs!
27University of California at Santa Barbara
Anonymized Facebook data (social graphs and interaction graphs) will be available for download soon at the Current Lab website!
http://current.cs.ucsb.edu/facebook
Questions?
4/2/2009
University of California at Santa Barbara 284/2/2009
Social Networks
Social Networks are popular platforms for interaction, communication and collaboration > 110 million users
9th most trafficked site on the Internet
> 170 million users #1 photo sharing site 4th most trafficked site on the Internet 114% user growth in 2008
> 800 thousand users 1,689% user growth in 2008
University of California at Santa Barbara 29
Facebook Orkut1
Number of Users Crawled 10,697,000 1,846,000
Percentage of Total Users 15% 26.9%
Number of Social Links Crawled
408,265,000 22,613,000
Radius 9.8 6
Diameter 13.4 9
Average Path Length 4.8 4.25
Clustering Coefficient 0.164 0.171
Power-law Coefficient α=1.5, D=0.55
α=1.5, D=0.6
High Level Graph Statistics
4/2/2009
1. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In Proc. of IMC, October 2007.
•Based on Facebook’s total size of 66 million users in early 2008
•Represents ~50% of all users in the crawled regions
•~49% of links were crawlable
•This provides a lower bound on the average number of in-network friends
•Avg. social degree = ~77
•Clustering Coefficient measures strength of local cliques
•Measured between zero (random graphs) and one (complete connectivity)
•Social networks display power law degree distribution
•Alpha is the curve of the power law
•D is the fitting error
University of California at Santa Barbara 30
Social Degree CDF
1 10 100 10000
10
20
30
40
50
60
70
80
90
100
YouTube
LiveJournal
Orkut
Social Degree
% o
f U
sers
(C
DF)
4/2/2009
University of California at Santa Barbara 31
0 10 20 30 40 50 600
10
20
30
40
50
60
70
80
90
100
Sorted by DegreeSorted by Total Inter-actions
% of Nodes
% o
f To
tal In
tera
cti
on
(C
DF)
Nodes vs. Total Interactions
4/2/2009
Top 10% of most well connected users are
responsible for 60% of total interactions
Top 10% of most interactive users are responsible for 85%
of total interactions•Social degree does not accurately predict human behavior
•Interactions are highly skewed towards a small percent of the Facebook population