estimating influence of online activity feeds on people's actions

Distinguishing between Personal Preferences and Social Influence in Online

Activity FeedsAmit Sharma* Dan Cosley

Microsoft Research Cornell University

ACM CSCW 2016

The power of social influence

Kandel (1978), Fisher and Bauman (1988), Cialdini (2001)

Adopting new behavior


Kandel (1978), Fisher and Bauman (1988), Cialdini (2001)

Changing people’s behavior


Goel et al. (2012), Iyenger et al. (2011)

Spreading ideas, products

Socio-technical systems: systems as social agents

Specific problem: How much do people copy their friends’ actions from the feed?

Copy-influence: Copying a friend’s action after being exposed to their activity on a social networking website

Tricky to estimate influence

• Did A’s friend like Item1 because of influence from A?

• Impossible to disentangle homophily without making further assumptions. (Shalizi and Thomas 2012)

Person A

Person A’s friend

Item1 Item2 Item 3

Item1

t

t

Key idea: Imagine a counterfactual world

What would have happened if a Last.fm user was not exposed to the activity feed of her friends?

A general framework: estimating the counterfactual

Estimate counterfactual

Data

ProblemEstimate influence from feed

Observed common actions between

friends (X)

Common actions without

exposure

Xc?

Copy-Influence = X – Xc

Assumptions?

Matching: A typical way to estimate counterfactuals

Matching Assumption: People similar on observable characteristics are expected to have the same behavior.

Matching: A typical way to estimate counterfactualsWe can match people by their attributes such as gender, race, age. [Aral et al. 2009]

Person A liked Coldplay after her

friend did so.

A similar person A’ liked Coldplay without

her friend doing so.

Person A would have liked Coldplay without

her friend doing so.

Can we do better for estimating the effect of activity feeds?

I. Preference Similarity assumption: Past actions of a person are a better proxy for modeling behavior.

Use similarity metrics based on past activity.

II. Feed Exposure assumption: People see friends’ updates in an unfiltered reverse-chronological feed.

Each person is exposed to the last M actions of their friends.

I. Matching non-friends using preference similarityUse Jaccard similarity in observed preferences to create a proxy for homophily.

13

Non-FriendsFriends

f5

u

f1

f4

f3f2

n5

u

n1

n4

n3n2

0.4 0.4

0.70.3

0.60.5

0.7 0.3

0.60.5

Amit Sharma

Without knowing time T that doesn't make sense. You can probably explain creating this parallel network without reference to T yet. Further, you kind of need to explain this for the locality bit anyways, so here you'll be able to say instead of matching with the most similar friends in the network (which you did in ICWSM for recommendation), you're matching with _comparably similar_ friends here to do the best you can to control for preference similarity.

II. Comparing to a counterfactual feed for non-friendsConstruct feed using last M actions of non-friends.

14

Friends’ feed

f1 Likes Beatles.

f2 Likes Coldplay.

f3 Likes Adele.

Matched Non-friends’ feed

n1 Likes Eminem.

n2 Likes Beatles.

n3 Likes LillyAllen.

Amit Sharma

Without knowing time T that doesn't make sense. You can probably explain creating this parallel network without reference to T yet. Further, you kind of need to explain this for the locality bit anyways, so here you'll be able to say instead of matching with the most similar friends in the network (which you did in ICWSM for recommendation), you're matching with _comparably similar_ friends here to do the best you can to control for preference similarity.

The full procedure: Estimating the copy-influence from a feedFor each action by a user, construct feeds from friends and non-friends containing their last M actions respectively.

Friends Overlap = Fraction of actions done by u that are also in the friends’ Feed (Naïve measure of Influence).

NonFriends Overlap = Fraction of actions done by u that are also in the non-friends’ Feed.

Copy-Influenceu = FriendsOverlap – NonFriendsOverlap

15

Amit Sharma

Probably need to justify that estimation of influence a bit with the story about external influence and underlying preference similarity being captured already, and maybe talk about this being an upper bound.

Preference-based Matched Estimation (PME)MATCHING STEP (before time T)

For each user:Construct a set of non-friends that are as similar to the user as her friends.

ESTIMATION STEP (after time T) For each user: Influenceu = FriendsOverlap – NonFriendsOverlap

16

Amit Sharma

Probably need to justify that estimation of influence a bit with the story about external influence and underlying preference similarity being captured already, and maybe talk about this being an upper bound.

The Last.fm dataset

17

LISTEN SONG LOVE SONG

# Ego Networks 96K# Total Users 312K# Total Songs 23M# Total Actions 656M

# Ego Networks 141K# Total Users 437K# Total Songs 13M# Total Actions 140M

Size of Feed(M) = 10Time T is chosen such that 90% of actions are before T.

Random seeds, Weighted breadth-first crawl for 3 months

*Dataset available at: http://www.amitsharma.in/#resources

http://www.amitsharma.in/#resources

http://www.amitsharma.in/#resources

18

Validation using semi-synthetic Loves data

Personal preference: Choose a song randomly from the last M loves by the k-most similar users (k=10).

Influence process: Choose a song randomly from the last M loves by her friends.

Process FriendsOverlap Influence Std. Error Personal Preference(PP) 0.042 0.001 0.0001

Influence(I) 1.00 0.99 0.0004

I-PP (10%-90%) 0.15 0.102 0.0001

Generate synthetic loves on songs after time T from any of the processes, keeping the timestamps and the social network same as before.

FriendsOverlap overestimates influence by at least 300% across listen and love actions.

19

Is this specific to Last.fm?

20

Assumptions of Influence Estimation:Ordinal time, reverse chronological feedPreferences as a proxy for homophily

Can be applied to any sharing platform that shows friends’ activities in a (loosely) reverse chronological order.

RATE BOOKS FAVORITE PHOTOS RATE MOVIES

# Ego Networks 252K# Total Users 252K# Total Items

1.3M# Total Actions 28M

# Ego Networks 49K# Total Users

50K# Total Items

48K# Total Actions 7.9M

# Ego Networks 175K# Total Users 183K# Total Items 11M# Total Actions

33M

[Huang et al. ‘12] [Jamali and Ester ‘10] [Cha et al. ‘09]

Amit Sharma

Showing other networks feels good. Don't forget to say that they all have the friend feeds as well.It might be useful to talk about what the needs/assumptions of the process are somewhere around here -- you want to make the point that this is a broadly useful and general test/procedure.

Amit Sharma

think about upper bound/lower bound etc and prepare yourself for questions..also what happens in domains with less variety of items and more variety of items

FriendsOverlap overestimates influence in all three domains

21

Overestimate by 14% in Flickr, more than 500% in Flixster.

Amit Sharma

think about how you lose power as preferences become more concentrated,as in Flixster..alsohave a backup slide comparing the different sharing networks in a table

Amit Sharma

Also, think about the question that this number(1%) is the mean of all users. It might vary between users. Have a backup slide for that question.

Influence is overrated(?)

22

Not more than 1% of user actions on online sharing networks can be attributed to influence.

Amit Sharma

think about how you lose power as preferences become more concentrated,as in Flixster..alsohave a backup slide comparing the different sharing networks in a table

Amit Sharma

Also, think about the question that this number(1%) is the mean of all users. It might vary between users. Have a backup slide for that question.

Amit Sharma

also have a backup slide for the table showing the difference in domains

• Focusing on a specific mechanism helps make progress on a tricky estimation problem.

• PME: A broadly applicable method for estimating influence that requires only logged activity data.

• Going forward, modeling counterfactuals can be a viable way to understand activity on socio-technical systems.

Final takeaways

thank you!

@amt_shrma

• PME: A broadly applicable method for estimating influence that requires only logged activity data.

• Going forward, modeling counterfactuals can be a viable way to understand activity on socio-technical systems.

estimating influence of online activity feeds on people's actions

Data & Analytics