estimating influence of online activity feeds on people's actions
TRANSCRIPT
Distinguishing between Personal Preferences and Social Influence in Online
Activity FeedsAmit Sharma* Dan Cosley
Microsoft Research Cornell University
ACM CSCW 2016
The power of social influence
Kandel (1978), Fisher and Bauman (1988), Cialdini (2001)
Adopting new behavior
The power of social influence
Kandel (1978), Fisher and Bauman (1988), Cialdini (2001)
Changing people’s behavior
The power of social influence
Goel et al. (2012), Iyenger et al. (2011)
Spreading ideas, products
Socio-technical systems: systems as social agents
Specific problem: How much do people copy their friends’ actions from the feed?
Copy-influence: Copying a friend’s action after being exposed to their activity on a social networking website
Tricky to estimate influence
• Did A’s friend like Item1 because of influence from A?
• Impossible to disentangle homophily without making further assumptions. (Shalizi and Thomas 2012)
Person A
Person A’s friend
Item1 Item2 Item 3
Item1
t
t
Key idea: Imagine a counterfactual world
What would have happened if a Last.fm user was not exposed to the activity feed of her friends?
A general framework: estimating the counterfactual
Estimate counterfactual
Data
ProblemEstimate influence from feed
Observed common actions between
friends (X)
Common actions without
exposure
Xc?
Copy-Influence = X – Xc
Assumptions?
Matching: A typical way to estimate counterfactuals
Matching Assumption: People similar on observable characteristics are expected to have the same behavior.
Matching: A typical way to estimate counterfactualsWe can match people by their attributes such as gender, race, age. [Aral et al. 2009]
Person A liked Coldplay after her
friend did so.
A similar person A’ liked Coldplay without
her friend doing so.
Person A would have liked Coldplay without
her friend doing so.
Can we do better for estimating the effect of activity feeds?
I. Preference Similarity assumption: Past actions of a person are a better proxy for modeling behavior.
Use similarity metrics based on past activity.
II. Feed Exposure assumption: People see friends’ updates in an unfiltered reverse-chronological feed.
Each person is exposed to the last M actions of their friends.
I. Matching non-friends using preference similarityUse Jaccard similarity in observed preferences to create a proxy for homophily.
13
Non-FriendsFriends
f5
u
f1
f4
f3f2
n5
u
n1
n4
n3n2
0.4 0.4
0.70.3
0.60.5
0.7 0.3
0.60.5
II. Comparing to a counterfactual feed for non-friendsConstruct feed using last M actions of non-friends.
14
Friends’ feed
f1 Likes Beatles.
f2 Likes Coldplay.
f3 Likes Adele.
Matched Non-friends’ feed
n1 Likes Eminem.
n2 Likes Beatles.
n3 Likes LillyAllen.
The full procedure: Estimating the copy-influence from a feedFor each action by a user, construct feeds from friends and non-friends containing their last M actions respectively.
Friends Overlap = Fraction of actions done by u that are also in the friends’ Feed (Naïve measure of Influence).
NonFriends Overlap = Fraction of actions done by u that are also in the non-friends’ Feed.
Copy-Influenceu = FriendsOverlap – NonFriendsOverlap
15
Preference-based Matched Estimation (PME)MATCHING STEP (before time T)
For each user:Construct a set of non-friends that are as similar to the user as her friends.
ESTIMATION STEP (after time T) For each user: Influenceu = FriendsOverlap – NonFriendsOverlap
16
The Last.fm dataset
17
LISTEN SONG LOVE SONG
# Ego Networks 96K# Total Users 312K# Total Songs 23M# Total Actions 656M
# Ego Networks 141K# Total Users 437K# Total Songs 13M# Total Actions 140M
Size of Feed(M) = 10Time T is chosen such that 90% of actions are before T.
Random seeds, Weighted breadth-first crawl for 3 months
*Dataset available at: http://www.amitsharma.in/#resources
18
Validation using semi-synthetic Loves data
Personal preference: Choose a song randomly from the last M loves by the k-most similar users (k=10).
Influence process: Choose a song randomly from the last M loves by her friends.
Process FriendsOverlap Influence Std. Error Personal Preference(PP) 0.042 0.001 0.0001
Influence(I) 1.00 0.99 0.0004
I-PP (10%-90%) 0.15 0.102 0.0001
Generate synthetic loves on songs after time T from any of the processes, keeping the timestamps and the social network same as before.
FriendsOverlap overestimates influence by at least 300% across listen and love actions.
19
Is this specific to Last.fm?
20
Assumptions of Influence Estimation:Ordinal time, reverse chronological feedPreferences as a proxy for homophily
Can be applied to any sharing platform that shows friends’ activities in a (loosely) reverse chronological order.
RATE BOOKS FAVORITE PHOTOS RATE MOVIES
# Ego Networks 252K# Total Users 252K# Total Items
1.3M# Total Actions 28M
# Ego Networks 49K# Total Users
50K# Total Items
48K# Total Actions 7.9M
# Ego Networks 175K# Total Users 183K# Total Items 11M# Total Actions
33M
[Huang et al. ‘12] [Jamali and Ester ‘10] [Cha et al. ‘09]
FriendsOverlap overestimates influence in all three domains
21
Overestimate by 14% in Flickr, more than 500% in Flixster.
Influence is overrated(?)
22
Not more than 1% of user actions on online sharing networks can be attributed to influence.
• Focusing on a specific mechanism helps make progress on a tricky estimation problem.
• PME: A broadly applicable method for estimating influence that requires only logged activity data.
• Going forward, modeling counterfactuals can be a viable way to understand activity on socio-technical systems.
Final takeaways
thank you!
@amt_shrma
• PME: A broadly applicable method for estimating influence that requires only logged activity data.
• Going forward, modeling counterfactuals can be a viable way to understand activity on socio-technical systems.