estimating influence of online activity feeds on people's actions

24
Distinguishing between Personal Preferences and Social In uence in Online Activity Feeds Amit Sharma* Dan Cosley Microsoft Research Cornell University ACM CSCW 2016

Upload: amit-sharma

Post on 18-Feb-2017

310 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Estimating influence of online activity feeds on people's actions

Distinguishing between Personal Preferences and Social Influence in Online

Activity FeedsAmit Sharma* Dan Cosley

Microsoft Research Cornell University

ACM CSCW 2016

Page 2: Estimating influence of online activity feeds on people's actions

The power of social influence

Kandel (1978), Fisher and Bauman (1988), Cialdini (2001)

Adopting new behavior

Page 3: Estimating influence of online activity feeds on people's actions

The power of social influence

Kandel (1978), Fisher and Bauman (1988), Cialdini (2001)

Changing people’s behavior

Page 4: Estimating influence of online activity feeds on people's actions

The power of social influence

Goel et al. (2012), Iyenger et al. (2011)

Spreading ideas, products

Page 5: Estimating influence of online activity feeds on people's actions

Socio-technical systems: systems as social agents

Page 6: Estimating influence of online activity feeds on people's actions

Specific problem: How much do people copy their friends’ actions from the feed?

Copy-influence: Copying a friend’s action after being exposed to their activity on a social networking website

Page 7: Estimating influence of online activity feeds on people's actions

Tricky to estimate influence

• Did A’s friend like Item1 because of influence from A?

• Impossible to disentangle homophily without making further assumptions. (Shalizi and Thomas 2012)

Person A

Person A’s friend

Item1 Item2 Item 3

Item1

t

t

Page 8: Estimating influence of online activity feeds on people's actions

Key idea: Imagine a counterfactual world

What would have happened if a Last.fm user was not exposed to the activity feed of her friends?

Page 9: Estimating influence of online activity feeds on people's actions

A general framework: estimating the counterfactual

Estimate counterfactual

Data

ProblemEstimate influence from feed

Observed common actions between

friends (X)

Common actions without

exposure

Xc?

Copy-Influence = X – Xc

Assumptions?

Page 10: Estimating influence of online activity feeds on people's actions

Matching: A typical way to estimate counterfactuals

Matching Assumption: People similar on observable characteristics are expected to have the same behavior.

Page 11: Estimating influence of online activity feeds on people's actions

Matching: A typical way to estimate counterfactualsWe can match people by their attributes such as gender, race, age. [Aral et al. 2009]

Person A liked Coldplay after her

friend did so.

A similar person A’ liked Coldplay without

her friend doing so.

Person A would have liked Coldplay without

her friend doing so.

Page 12: Estimating influence of online activity feeds on people's actions

Can we do better for estimating the effect of activity feeds?

I. Preference Similarity assumption: Past actions of a person are a better proxy for modeling behavior.

Use similarity metrics based on past activity.

II. Feed Exposure assumption: People see friends’ updates in an unfiltered reverse-chronological feed.

Each person is exposed to the last M actions of their friends.

Page 13: Estimating influence of online activity feeds on people's actions

I. Matching non-friends using preference similarityUse Jaccard similarity in observed preferences to create a proxy for homophily.

13

Non-FriendsFriends

f5

u

f1

f4

f3f2

n5

u

n1

n4

n3n2

0.4 0.4

0.70.3

0.60.5

0.7 0.3

0.60.5

Amit Sharma
Without knowing time T that doesn't make sense. You can probably explain creating this parallel network without reference to T yet. Further, you kind of need to explain this for the locality bit anyways, so here you'll be able to say instead of matching with the most similar friends in the network (which you did in ICWSM for recommendation), you're matching with _comparably similar_ friends here to do the best you can to control for preference similarity.
Page 14: Estimating influence of online activity feeds on people's actions

II. Comparing to a counterfactual feed for non-friendsConstruct feed using last M actions of non-friends.

14

Friends’ feed

f1 Likes Beatles.

f2 Likes Coldplay.

f3 Likes Adele.

Matched Non-friends’ feed

n1 Likes Eminem.

n2 Likes Beatles.

n3 Likes LillyAllen.

Amit Sharma
Without knowing time T that doesn't make sense. You can probably explain creating this parallel network without reference to T yet. Further, you kind of need to explain this for the locality bit anyways, so here you'll be able to say instead of matching with the most similar friends in the network (which you did in ICWSM for recommendation), you're matching with _comparably similar_ friends here to do the best you can to control for preference similarity.
Page 15: Estimating influence of online activity feeds on people's actions

The full procedure: Estimating the copy-influence from a feedFor each action by a user, construct feeds from friends and non-friends containing their last M actions respectively.

Friends Overlap = Fraction of actions done by u that are also in the friends’ Feed (Naïve measure of Influence).

NonFriends Overlap = Fraction of actions done by u that are also in the non-friends’ Feed.

Copy-Influenceu = FriendsOverlap – NonFriendsOverlap

15

Amit Sharma
Probably need to justify that estimation of influence a bit with the story about external influence and underlying preference similarity being captured already, and maybe talk about this being an upper bound.
Page 16: Estimating influence of online activity feeds on people's actions

Preference-based Matched Estimation (PME)MATCHING STEP (before time T)

For each user:Construct a set of non-friends that are as similar to the user as her friends.

ESTIMATION STEP (after time T) For each user: Influenceu = FriendsOverlap – NonFriendsOverlap

16

Amit Sharma
Probably need to justify that estimation of influence a bit with the story about external influence and underlying preference similarity being captured already, and maybe talk about this being an upper bound.
Page 17: Estimating influence of online activity feeds on people's actions

The Last.fm dataset

17

LISTEN SONG LOVE SONG

# Ego Networks 96K# Total Users 312K# Total Songs 23M# Total Actions 656M

# Ego Networks 141K# Total Users 437K# Total Songs 13M# Total Actions 140M

Size of Feed(M) = 10Time T is chosen such that 90% of actions are before T.

Random seeds, Weighted breadth-first crawl for 3 months

*Dataset available at: http://www.amitsharma.in/#resources

Page 18: Estimating influence of online activity feeds on people's actions

18

Validation using semi-synthetic Loves data

Personal preference: Choose a song randomly from the last M loves by the k-most similar users (k=10).

Influence process: Choose a song randomly from the last M loves by her friends.

Process FriendsOverlap Influence Std. Error Personal Preference(PP) 0.042 0.001 0.0001

Influence(I) 1.00 0.99 0.0004

I-PP (10%-90%) 0.15 0.102 0.0001

Generate synthetic loves on songs after time T from any of the processes, keeping the timestamps and the social network same as before.

Page 19: Estimating influence of online activity feeds on people's actions

FriendsOverlap overestimates influence by at least 300% across listen and love actions.

19

Page 20: Estimating influence of online activity feeds on people's actions

Is this specific to Last.fm?

20

Assumptions of Influence Estimation:Ordinal time, reverse chronological feedPreferences as a proxy for homophily

Can be applied to any sharing platform that shows friends’ activities in a (loosely) reverse chronological order.

RATE BOOKS FAVORITE PHOTOS RATE MOVIES

# Ego Networks 252K# Total Users 252K# Total Items

1.3M# Total Actions 28M

# Ego Networks 49K# Total Users

50K# Total Items

48K# Total Actions 7.9M

# Ego Networks 175K# Total Users 183K# Total Items 11M# Total Actions

33M

[Huang et al. ‘12] [Jamali and Ester ‘10] [Cha et al. ‘09]

Amit Sharma
Showing other networks feels good. Don't forget to say that they all have the friend feeds as well.It might be useful to talk about what the needs/assumptions of the process are somewhere around here -- you want to make the point that this is a broadly useful and general test/procedure.
Amit Sharma
think about upper bound/lower bound etc and prepare yourself for questions..also what happens in domains with less variety of items and more variety of items
Page 21: Estimating influence of online activity feeds on people's actions

FriendsOverlap overestimates influence in all three domains

21

Overestimate by 14% in Flickr, more than 500% in Flixster.

Amit Sharma
think about how you lose power as preferences become more concentrated,as in Flixster..alsohave a backup slide comparing the different sharing networks in a table
Amit Sharma
Also, think about the question that this number(1%) is the mean of all users. It might vary between users. Have a backup slide for that question.
Page 22: Estimating influence of online activity feeds on people's actions

Influence is overrated(?)

22

Not more than 1% of user actions on online sharing networks can be attributed to influence.

Amit Sharma
think about how you lose power as preferences become more concentrated,as in Flixster..alsohave a backup slide comparing the different sharing networks in a table
Amit Sharma
Also, think about the question that this number(1%) is the mean of all users. It might vary between users. Have a backup slide for that question.
Amit Sharma
also have a backup slide for the table showing the difference in domains
Page 23: Estimating influence of online activity feeds on people's actions

• Focusing on a specific mechanism helps make progress on a tricky estimation problem.

• PME: A broadly applicable method for estimating influence that requires only logged activity data.

• Going forward, modeling counterfactuals can be a viable way to understand activity on socio-technical systems.

Final takeaways

Page 24: Estimating influence of online activity feeds on people's actions

thank you!

@amt_shrma

• PME: A broadly applicable method for estimating influence that requires only logged activity data.

• Going forward, modeling counterfactuals can be a viable way to understand activity on socio-technical systems.