columbia business school - social ties and user-generated … · 2013-02-01 · shriver, nair, and...

19
MANAGEMENT SCIENCE Articles in Advance, pp. 1–19 ISSN 0025-1909 (print) ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.1110.1648 © 2013 INFORMS Social Ties and User-Generated Content: Evidence from an Online Social Network Scott K. Shriver Columbia Business School, Columbia University, New York, New York 10027, [email protected] Harikesh S. Nair Graduate School of Business, Stanford University, Stanford, California 94305, [email protected] Reto Hofstetter Center for Customer Insight, University of St. Gallen, CH-9000 St. Gallen, Switzerland, [email protected] W e exploit changes in wind speeds at surfing locations in Switzerland as a source of variation in users’ propensity to post content about their surfing activity on an online social network. We exploit this varia- tion to test whether users’ online content-generation activity is codetermined with their social ties. Economically significant effects of this type can produce positive feedback that generates local network effects in content gen- eration. When quantitatively significant, the increased content and tie density arising from the network effect induces more visitation and browsing on the site, which fuels growth by generating advertising revenue. We find evidence consistent with such network effects. Key words : marketing; user-generated content; social networks; monotone treatment response; monotone treatment selection; monotone instrumental variables; homophily; endogenous group formation; correlated unobservables; endogeneity History : Received June 4, 2010; accepted June 20, 2012, by J. Miguel Villas-Boas, marketing. Published online in Articles in Advance. 1. Introduction We measure empirically the codetermination of user- generated content and social ties in an online social network. Our main point is that online social net- works are potentially subject to network effects in content generation. Low-level network effects can arise in the following way: increasing a user’s social ties on the network may induce him to post more con- tent, which then causes him to obtain more ties, which in turn causes him to post more content and so on. This process generates local positive feedback between ties and content, which thereby produces a self-reinforcing “virtuous cycle.” Such feedback may be linked to several motivations. Altruistic user inten- tions (users wanting to share information with oth- ers) and a desire to increase social status within a peer group may cause users to post more content when they have more ties (an audience effect). Users who post more content may in turn receive more ties because of their popularity or the intrinsic value others derive from consuming their content. Positive feedback of this sort may be an important compo- nent of an online network’s revenue model. When quantitatively significant, the increased content and tie density arising from the network effect induces more visitation and browsing on the site, which fuels growth by generating advertising revenue. The aca- demic literature on social networking (reviewed later) has paid relatively less attention to such content- related network effects. The main goal of this paper is to investigate such network effects using data from a real-world social network. Empirically analyzing these network effects is com- plicated by identification problems endemic to the measurement of social effects from observational data (e.g., see Manski 2000). Formally, the network effect can be represented by a simultaneous equations model in which content drives social tie formation and tie formation in turn drives content generation. Positive feedback exists when content has a positive effect on the propensity to obtain more social ties and when having more ties has a positive effect on the propensity to post content. Unfortunately, identify- ing causality in these two effects is confounded by the simultaneity of their coproduction and by the fact that, in addition to the causal effects of interest, sev- eral other sources of spurious correlation can explain the covariation in content and ties observed in social network data. This paper presents data from an online social net- work that documents the codetermination of content 1 Published online ahead of print February 1, 2013

Upload: others

Post on 26-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

MANAGEMENT SCIENCEArticles in Advance, pp. 1–19ISSN 0025-1909 (print) � ISSN 1526-5501 (online) http://dx.doi.org/10.1287/mnsc.1110.1648

© 2013 INFORMS

Social Ties and User-Generated Content:Evidence from an Online Social Network

Scott K. ShriverColumbia Business School, Columbia University, New York, New York 10027,

[email protected]

Harikesh S. NairGraduate School of Business, Stanford University, Stanford, California 94305,

[email protected]

Reto HofstetterCenter for Customer Insight, University of St. Gallen, CH-9000 St. Gallen, Switzerland,

[email protected]

We exploit changes in wind speeds at surfing locations in Switzerland as a source of variation in users’propensity to post content about their surfing activity on an online social network. We exploit this varia-

tion to test whether users’ online content-generation activity is codetermined with their social ties. Economicallysignificant effects of this type can produce positive feedback that generates local network effects in content gen-eration. When quantitatively significant, the increased content and tie density arising from the network effectinduces more visitation and browsing on the site, which fuels growth by generating advertising revenue. Wefind evidence consistent with such network effects.

Key words : marketing; user-generated content; social networks; monotone treatment response; monotonetreatment selection; monotone instrumental variables; homophily; endogenous group formation; correlatedunobservables; endogeneity

History : Received June 4, 2010; accepted June 20, 2012, by J. Miguel Villas-Boas, marketing. Published onlinein Articles in Advance.

1. IntroductionWe measure empirically the codetermination of user-generated content and social ties in an online socialnetwork. Our main point is that online social net-works are potentially subject to network effects incontent generation. Low-level network effects can arisein the following way: increasing a user’s social tieson the network may induce him to post more con-tent, which then causes him to obtain more ties,which in turn causes him to post more content andso on. This process generates local positive feedbackbetween ties and content, which thereby produces aself-reinforcing “virtuous cycle.” Such feedback maybe linked to several motivations. Altruistic user inten-tions (users wanting to share information with oth-ers) and a desire to increase social status within apeer group may cause users to post more contentwhen they have more ties (an audience effect). Userswho post more content may in turn receive moreties because of their popularity or the intrinsic valueothers derive from consuming their content. Positivefeedback of this sort may be an important compo-nent of an online network’s revenue model. Whenquantitatively significant, the increased content andtie density arising from the network effect induces

more visitation and browsing on the site, which fuelsgrowth by generating advertising revenue. The aca-demic literature on social networking (reviewed later)has paid relatively less attention to such content-related network effects. The main goal of this paperis to investigate such network effects using data froma real-world social network.

Empirically analyzing these network effects is com-plicated by identification problems endemic to themeasurement of social effects from observational data(e.g., see Manski 2000). Formally, the network effectcan be represented by a simultaneous equationsmodel in which content drives social tie formationand tie formation in turn drives content generation.Positive feedback exists when content has a positiveeffect on the propensity to obtain more social ties andwhen having more ties has a positive effect on thepropensity to post content. Unfortunately, identify-ing causality in these two effects is confounded bythe simultaneity of their coproduction and by the factthat, in addition to the causal effects of interest, sev-eral other sources of spurious correlation can explainthe covariation in content and ties observed in socialnetwork data.

This paper presents data from an online social net-work that documents the codetermination of content

1

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Published online ahead of print February 1, 2013

Page 2: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content2 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

and ties and discusses some strategies to address thisidentification problem. The success of these strate-gies is dependent on primitive assumptions aboutbehavior that we discuss in the paper. Some of theidentifying assumptions may be violated under alter-native stories of user behavior, and therefore ourresults should be seen with this caveat in mind. Wefirst consider the problem of measuring the effect ofcontent generation on the propensity to obtain socialties. To address the simultaneity of coproduction,we look for a source of variation in users’ propen-sity to post content on the website that may notdirectly affect tie formation. To do this, we exploitinstitutional features of our empirical setting. Oursetting involves the content posting and tie forma-tion activities of users of an online social networknamed Soulrider.com, which hosts one of the largestsports-based online communities in Switzerland. Wefocus on the large community of windsurfers on thesite. Users post content about their surfing activities,including blogs and reports of wind speeds and con-ditions at specific surfing locations. Our data com-prise historical information on content posted andsocial ties formed by these users on Soulrider.com.We augment the website data with high-frequencyinformation on wind forecasts at all surf locationsin the country obtained from the Swiss Meteorologi-cal Office (http://www.meteoswiss.ch). We utilize thefact that surfing is possible for most users only if windspeeds are greater than or equal to 3 BFT.1 We showthat wind speeds significantly explain the observedvariation in content postings on the website. We usethe variation in wind speeds at various lake locationsfrequented by users as shifters of users’ propensity tovisit surf locations and to subsequently post contentabout their surfing activity. Additionally, we exploitthe panel nature of the data to include user and time-period fixed effects to control for spurious sourcesof correlation that drive content generation and tieformation.

We then discuss the measurement of the reverseeffect—the effect of social ties on the propensity topost content. The main challenge here is that thosethat tend to obtain more social ties for unobservablereasons (e.g., “gregariousness”) may also post morecontent for the same reasons. Hence, the number of tierequests received is econometrically endogenous. Thisis a manifestation of the “homophily” or “endoge-nous group formation” problem in measuring socialinteraction (e.g., Moffitt 2001). We discuss a strat-egy using the number of friend requests to anagent’s friends as an instrument for his friendrequests, including individual and time fixed effects,

1 BFT stands for “Beaufort,” the international wind scale used inweather reporting.

to control for agent-specific and common unobserv-ables that may drive friendship formation. Intuitively,we exploit variation in network position under theassumption that after controlling for any commonshocks, an agent’s content generation is likely drivenonly by the environment facing him, and not hisfriends. These instrumental variable (IV) assumptionscan be violated if wind affects friendship formationdirectly because users meet future online friends atmore windy surf locations, or if friends receive tie-formation requests in a way that is directly relatedto the focal agent’s blogging activity. If these exclu-sion restrictions are violated, our estimates shouldbe interpreted only as correlational and not causal.We provide some evidence suggesting that the exo-geneity assumption is reasonable but do not ruleout exogeneity concerns fully. We also present esti-mates from two alternative approaches that use muchweaker assumptions and deliver bounds on the treat-ment effects.

For the bounds, we use an incomplete model ofthe social network that leaves the formal model oftie formation and content generation unspecified butincorporates the fact that ties and content are notrandomly allocated across units of observations. Ourapproach is in the spirit of Haile and Tamer (2003)and Ciliberto and Tamer (2009), in the sense in thatwe recognize tie formation is nonrandom and jointlydetermined with content generation, but we leave theactual model-generating ties unspecified. Followingthe framework of Manski (1997), we impose weakmonotonicity assumptions on the response of con-tent to ties and of ties to content that imply infor-mative, nonparametric bounds on the causal effectsof interest. Following Manski and Pepper (2000), wealso report bounds under the assumption that windspeeds and friends’ friend requests are monotoneinstrumental variables (MIVs), a weaker assumptionthan the requirement that they are IVs for contentand friends, respectively. Both bounds are robust toselection and to misspecification concerns. We findthe point estimates from the IVs are contained in thebounds we obtain and show that monotonicity is notrejected by the data.

An alternative solution to the codependence of net-work formation and content generation is to specifya formal model for tie formation and content produc-tion and to incorporate into estimation the likelihoodof these actions for all network members. However,one difficulty of this approach is that the endogenousactions of tie formation and content posting are theequilibrium outcomes of a complicated, multiagentnetwork game, which needs to be solved for everyguess of the underlying parameter vector to computethe likelihood for inference. Apart from the compu-tational complexity, this estimation is also subject to

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 3: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 3

misspecification bias if the wrong network model isassumed. More difficulties arise from the fact thatrealistic models of network formation games that canexplain features of observed online network struc-ture also tend to be prone to multiple equilibria (e.g.,see Jackson 2008). The bounds strategy avoids takinga stance on a specific network game but recognizesthe codependence in inference and thus makes someheadway on this difficult inference problem.

Our analysis provides evidence consistent with net-work effects. We find that a user’s social ties arepositively associated with his content generation andthat content generation in turn is positively associatedwith obtaining social ties. We find large effects of con-trolling for endogenous group formation and corre-lated unobservables. In the absence of these controls,we find that measures of social effects are overstatedabout 25%–30% on average across model specifica-tions. To interpret our estimates, we calibrate therevenue implications of content generation to the net-work. Translating content to advertising revenues,we calculate an incremental blog is worth about0.736 Swiss francs (CHF) on average to Soulrider.comand that local feedback with ties can augment therevenue by about an additional 5.5% on averageacross users.

Our paper relates to extant literature investigatingthe process of tie formation in online social networksand the influence of user-generated content on onlineactivity.2 Recent studies in marketing include Trusovet al. (2009) on online opinion leader identification,Stephen and Toubia (2010) on the economic effects ofsocial commerce, and Mayzlin and Yoganarasimhan(2012) and Katona and Sarvary (2009) on link forma-tion. A growing stream of research also explores theeffect of user-generated content, broadly defined, oneconomic outcomes. These include Dhar and Chang(2009) on the effect of blog posts on future musiconline sales, Dellarocas (2006) on Internet forums,Duan et al. (2008) and Chintagunta et al. (2010) on theeffect of online reviews on movie box office perfor-mance, Chevalier and Mayzlin (2006) and Oestreicher-Singer and Sundararajan (2012) on the effect of bookreviews on online book sales, Albuquerque et al.(2012) on the interdependence between creating andpurchasing online content, and Zhang and Sarvary(2011) on competition with user-generated content.These studies suggest that user-generated contentplays an important role in consumer decisions andcompetition but is not concerned with its interactionwith social ties per se. A parallel stream of litera-ture describes motives of users for generating content.

2 More generally, this paper relates to the large literature on socialinteractions (see Hartmann et al. 2008 for a recent survey).

Ahn et al. (2011) and Kumar (2011) develop struc-tural models of content creation and consumption,and Zhang and Zhu (2011) demonstrate that contentcreation responds to audience size. Nov (2007) sur-veyed contributors to Wikipedia.com and found thatfun and ideology are the primary drivers of contentgeneration. Hennig-Thurau et al. (2004) report thatconsumers’ desire for social interaction, desire for eco-nomic incentives, concern for other consumers, andthe potential to enhance their own self-worth are theprimary factors motivating consumers to articulatethemselves online. Their finding that desire for socialinteraction is a driver to post content has been cor-roborated by descriptive studies like Bughin (2007),which documents the desire for fame and to shareexperiences with friends as reasons for posting con-tent online. Likewise, Nardi et al. (2004) emphasizethe community aspect of content creation. A surveywe conducted finds motivations that are consistentwith these findings. Although this stream of literaturesuggests the existence of effects between content andsocial ties, it does not investigate the codependenceexplicitly.

To date, few studies have explicitly discussed orinvestigated the potential interplay between user-generated content and social ties. Godes et al. (2005)include in their definition of social interactions thepossibility of non-face-to-face interactions, such aspassive observations that could include effects ofendogenously generated content on social ties. Ghoseand Han (2011) state that little is known about howcontent creation is related to content usage or socialnetwork characteristics. Goldenberg et al. (2009),Katona et al. (2011), and Yoganarasimhan (2012) doc-ument that network structure matters in the diffu-sion of content. Narayan and Yang (2007) model thetendency to form ties, but do not consider the roleof content. Using data from Wallop, an online per-sonal publishing and social networking system, Lentoet al. (2006) test how the number and nature of socialties are related to people’s willingness to contributecontent to a blog. Their aim is to predict future con-tent contribution, not to measure causality; hence, thestudy does not control for the endogeneity concernswe outline here. Their results show that the partic-ular type and strength of the tie are correlated withthe effect of social ties on content. In their descrip-tive analysis, however, Lento et al. (2006) find that thenumber of in-degree ties is not a significant predic-tor of future content production. Hence, we believethis is one of the first studies to measure the code-termination of content and ties while consideringthe identification challenges implied by simultaneity,endogenous group formation, and spurious correla-tion caused by common unobservables.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 4: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content4 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

The remainder of this paper is organized as follows.In §2, we provide a short overview of the online socialnetwork Soulrider.com, from which the data are col-lected. In §3, we present a variety of estimators forthe effects. In §4, we present the results and the impli-cations of our model. In §5, we conclude the paper.

2. Empirical SettingWe now provide a brief overview of the empiricalsetting for our data: an online social network namedSoulrider.com. We discuss how we operationalize thekey variables of interest in the data. Subsequently, wediscuss the model and empirical strategy for identifi-cation of the causal effects of interest. A more detaileddescription of the data set is provided later with theresults.

2.1. Soulrider.comSoulrider.com is a privately held community websitebased in Europe that focuses on extreme sports suchas windsurfing, surfing, and snowboarding. The sitewas launched in 2002 and as of June 2011 had a totalof 10,677 registered users.

Users of Soulrider.com can both consume existingcontent and submit their own. Most content on thewebsite, such as blogs or forum messages, is gen-erated by users themselves. Other content, such asnational-level sport industry news, is provided bythird-party contributors and is not affected by users.Users who wish to post content or engage in socialnetworking activities are required to create a freeaccount on the website. The main value of content isinformational. Blogs and posts typically contain infor-mation about where surf conditions are best, whereother users plan to surf in a given week, and reportson wind speeds at those locations. These posts helpusers better organize and coordinate their sportingactivities.

In addition to consuming and generating content,users can create ties to other users and thereby takepart in an online social network. Tie formation amongusers is facilitated by the website through variousfunctions such as a people search engine, an emailinvitation tool, interest groups, and an “add as friend”function that handles the mechanics of creating tiesin the online interface. Communication among usersis eased by internal mail functionality, instant mes-saging, and public chat. The process of “friending”(or tie formation) on the site involves sending a tie-formation request through the website that has to beaccepted by the recipient. Friending enhances accessto content. By default, users on Soulrider.com can seeothers’ profiles and content. However, access to one’sprofile and albums and the ability to contact via emailmay be restricted by some users to only their friends.

Friending also provides benefits to users in obtain-ing status updates and enhanced online interactionwith others. Upon login, friends who are currentlyonline are displayed on the welcome screen for a user,and direct access is provided to all their online ses-sions and activities. Users can also add comments andtag photos of only those who have accepted themas friends. This is the norm in most online socialnetworks.

Soulrider.com is representative of networks target-ing young adults. As of June 2011, 75% of the userson the website were male and 25% were female. Themean user was 29 years old and possessed 1.4 friends.The social network grew by 1,164 (12.3%) users fromJune 2010 to June 2011. Within this period, 470 (4.4%)added at least one friend, which generated 1,193 newsocial ties; blogs were contributed by 477 (4.5%) users.The site averaged 37,211 unique visits per month,with visitors staying on the website for an average of3.8 minutes, generating a monthly average of 330,108page impressions. Google has indexed 31,800 pagesand 27 backlinks for Soulrider.com, leading to a page-rank value of 6.3

Although it is smaller than some well-known onlinesocial networks, content generation on the site is sim-ilar in pattern to those reported at larger online socialnetworks. Content contribution on Soulrider.com fol-lows the so-called participation inequality or “90-9-1”principle (e.g., Brothers et al. 1992, Ochoa and Duval2008), a commonly observed empirical pattern ofonline user-generated content distributions, whereby90% of users form the “lurkers,” or audience, whodo not actively contribute to the site, 9% of users are“editors,” sometimes modifying content or adding toan existing thread but rarely creating content fromscratch, and 1% of users are “creators,” driving largeamounts of the social group’s activity. The 90-9-1 prin-ciple has been used as a rule of thumb to characterizeuser-generated content production and roughly holdsfor Soulrider.com as well, strengthening to someextent the external validity of our findings.4 To sum-marize, Soulrider.com is a medium-sized social net-work that appeals to a core community with sharedinterests, grows primarily by word of mouth, andgenerates content in a fashion representative of manyonline social networks.

2.2. Data and Variable OperationalizationWe worked with Soulrider.com to add a logging func-tionality to capture details of network growth and

3 Based on impression statistics from Google Analytics for the30-day period ending June 16, 2011.4 For instance, of 22,279 unique visitors to the site in December2010, 2,161 (9.7%) produced content. Also, 1% of registered usersare heavy users, accounting for 47% of all produced content (blogs).

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 5: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 5

content generation and consumption. Our data com-prise complete details of user tie formation and con-tent generation from March 2, 2009, to November 7,2010, a period spanning 89 weeks. For the purposes ofthis paper, we focus on a group of 703 self-identifiedwindsurfers on the website, giving us a panel with57,040 user-week observations for analysis. The largepanel size facilitates the inclusion of fixed effects tocontrol for unobservables, as well as relaxing severalparametric assumptions for inference.

We operationalize user content via a generic vari-able we call blogs, which counts the number of post-ings a user has made to the website each period.A posting is counted as adding 1 to the blogs vari-able if it contains any text or photos. Thus, if a userposted a photo and a text message on the website,two text messages, or two photos, the blogs variablewould equal 2. One limitation of this approach is thatwe focus only on the incidence of postings. We donot account for the type of content added because ofthe difficulties and ambiguity associated with sortingposts into predefined classes. We operationalize tiesby a variable, friends, which counts the new friend-ship ties requested by others to each user per period.Essentially, we ask if content creation facilitates newfriend requests and whether new friend requests inturn facilitate the creation of new content.5

2.2.1. Understanding Motivations and Mecha-nisms: A Survey. To learn about the mechanismsbehind blogging and link formation, we implementedan extensive survey on Soulrider.com. We used anopen-ended survey to allow users to freely expresstheir motivations for friending and posting content.We then used content analysis involving patternmatching to analyze the qualitative response data.We categorized the motives as stated by users intopredefined types and counted frequencies of occur-rences of these categories. This exploratory analysishelped us set up the model, understand user behaviorand mechanisms, and interpret the data.6

We invited 3,157 self-identified windsurfers ofSoulrider.com by email to participate in the survey.Two hundred sixty-six (8.43%) chose to participate.In the survey, we asked the following three open-ended questions related to blogging: Q1: “What are

5 We do not answer the separate question of how content gener-ation is affected by the number of friend requests a user initiates.This is a different economic problem in which each user choosesblogging frequency and outgoing friend requests jointly to maxi-mize his utility, which requires specification of a formal structuralmodel of the network game. This problem is outside the scope ofthe current analysis and is the subject of future research.6 The coding scheme, sample quotes, definitions, reliability mea-sures, and frequencies are available by request from the authorsand in the companion online appendix (http://faculty-gsb.stanford.edu/nair/documents/TechnicalAppendix.pdf).

and have been reasons and motives for you to adda windsurf-session on Soulrider.com?” (see Table A1in the online appendix). Q2: “If you are online onSoulrider.com, what specifically prompts you to inserta windsurf-session?” (see Table A2 in the onlineappendix). Q3: “What benefit does Soulrider.comprovide relating to your windsurf sessions?” (seeTable A3 in the online appendix). Similarly, we askedthree open-ended questions related to friending: Q4:“What are and have been reasons and motives for youto link to somebody as your friend on Soulrider.com?”(see Table A4 in the online appendix). Q5: “If you areonline on Soulrider.com, what specifically promptsyou to send a friendship request to somebody?” (seeTable A5 in the online appendix). Q6: “What benefitdo you obtain from your friends on Soulrider.com?”(see Table A6 in the online appendix). For each ques-tion, respondents were advised to answer with shorttext or a few key words. For further analysis, weonly used answers to questions Q1–Q3 from 121respondents who indicated having at least one friendon Soulrider.com and answers to questions Q4–Q6from 89 respondents who reported posting at leastone blog.

We then processed the survey data using a con-tent analysis protocol (e.g., Krippendorff 2004). Eachstatement was coded using a coding schema. Follow-ing suggestions in the literature on textual coding(Corbin and Strauss 2008), we started with an opencoding of the first 50 statements to derive descrip-tive codes that were close to the actual text and thengrouped codes into categories. In second-stage cod-ing, we applied this scheme to all the data to tabulatefrequencies of occurrences per code and category. Allcoding was performed using the software packageATLAS.ti (Friese 2011). The data analysis was done byseveral coders and checked for intercoder reliability.Reliability of these codes was then assessed using twoindependent raters, who recoded the data into thegiven code categories. These raters were not currentusers of Soulrider.com and were not familiar withthe details of the research project. Interrater reliabilityvalues (Cohen’s Kappa) of 0.65 on average indicatedgood reliability of the derived categories.

Determinants of Blogging. Fifty-five percent of usersin the survey stated “altruistic” reasons for blog-ging on Soulrider.com (see Table A1). Altruistic usersexpress a need to inform others about their experi-ences and to share photos and various kinds of infor-mation related to surfing or conditions. In contrast,18% of users indicated they blog only for their ownpurpose (egocentric motivation). This includes keep-ing a diary or archive of one’s surf sessions, one func-tionality offered by the website. A substantial numberof users indicated that they use their blogging toexpress or promote themselves (20%), which is not

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 6: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content6 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

surprising, because many professional windsurfersuse the site to appeal to potential sponsors. To inves-tigate the determinants at a more detailed level, weasked users what prompts them to post a blog (seeTable A2). A majority (62%) indicated that they wouldadd a new entry shortly after they went surfing.Others (22%) stated they would not blog about everysession, but only about those that were special orwhere conditions were particularly unique (e.g., windat storm level). Interestingly, 16% also explicitly indi-cated that they would blog to facilitate interaction orsocializing with others. As meeting or communicat-ing with other surfers at the spot is often perceivedto be difficult, some use blogging as a means to findout which other surfers surfed at the same spot onthe same day. Socializing is also a major value driverof blogging for some users (22%) (see Table A3, Q3).Information motives, either at the community (altru-istic) or individual (egocentric) level, are the most fre-quently stated drivers of the value of blogging (thesecategories together account for 63% of the statements).Determinants of Friending. The stated reasons for

friending are more fragmented (see Table A4). Wegrouped descriptions into four main categories.Thirty-four percent of users indicated that they wouldlink to somebody only if he or she was also a closefriend in real life (i.e., a strong tie). Others stated theywould also link weaker contacts, as long as they bothshared common interests (21%). Twenty-seven percentof users state they intend to keep in touch with theirfriends via the website. Consuming and sharing con-tent, such as blogs, photos, or information about surfspots or surf destinations, also drove friending (18%).Answers to Q5 provide more detailed insights intothe specific determinants of sending friend requests.As before, users are likely to send a friend requestif they observe the online presence of one of theirclose friends on Soulrider.com (39%). Other determi-nants include “keeping in touch” (16%), consumingand sharing content (6%), and networking with like-minded people (38%). Some users indicated that theyfriended somebody they recently met offline (e.g., atsurf sessions, on holidays, or when trading wind-surf equipment), though this proportion is small (9%).Overall, the value of friending seems to be mostlyrelated to consuming and sharing friend- or sport-related content (61%). Twenty percent of users statethat the website facilitates communication betweenusers. Interestingly, a small proportion (11%) specif-ically indicated that they derive no value from theirfriends.

We find wind speed is strongly associated withblogging. In answering Q2, 62% of the respondentsindicated that they would post a blog after they wentsurfing or have new content related to a surf session.In addition, 22% indicated they would blog if they

experienced a unique session. Finally, we use thesedata to assess informally whether surfing at the samespot on the same day is a driver of friending. Count-ing the incidence of statements like “to link to some-body I know from surfing” or “to link to somebody Imet at a surf session” as reasons for friending, we findthese account for 11 quotations or 7.14% of all quota-tions in this coding scheme altogether (see Table A5).Although small, this response rate shows that we can-not rule this motivation for link formation out com-pletely, and thus our identification strategy should beseen with this caveat.

3. Empirical ApproachWe now discuss our empirical strategy. The goal ofthe empirical work is to learn the effects of blogs onfriends and of friends on blogs. In §3.1, we discuss howwe identify the effect of blogs on friends. In §3.2, wediscuss identification of the reverse effect.

The plan of this paper is to first establish evi-dence of a correlation between blogs and friendsin the data. We then show that this correlation isrobust to the inclusion of individual fixed effects(to control for homophily) and of time period fixedeffects (to control for correlated common unobserv-ables). Interpreting these correlations as causal effectsrequires additional assumptions. We present an IVapproach based on wind speeds, which delivers esti-mates of causal effects under the assumption of exo-geneity of these instruments. If these assumptions areviolated, the causal interpretation no longer holds.We present some evidence suggesting that the exo-geneity assumptions are reasonable, though we can-not conclusively establish their validity a priori. Wealso present bounds on causal effects that rely onweak monotonicity assumptions. We find that thepoint estimates from the IVs are contained in thebounds we obtain.

3.1. Effect of Content Generation onTie Formation

We start with a linear model linking blogs (bit) andfriends (fit). Here, i stands for individual and t standsfor week. Let Bit and Fit denote the cumulative num-ber of blogs written and friendship requests, respec-tively, to agent i at the beginning of (but excluding)period t. The linear model is

fit = �1i +�1t + �1bit +�1Bit + �1Fit + �1zit + �1it0 (1)

Equation (1) is in essence a dynamic panel data modelin which we allow past blogging (Bit =

∑t−1�=−�

bi� )and past friendship requests (Fit =

∑t−1�=−�

fi� ) aswell as current blogging to affect current friendshiprequests. Equation (1) also allows friends to dependon other exogenous variables (zit); in our empirical

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 7: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 7

specifications, we include in zit the number of daysa user has been registered on the site as a controlfor common tieing trends over user tenure. The maineffect of interest for our study is �1, the marginal effectof blogs on friends.

A concern with estimating �1 is that individualsare sorted into groups on the basis of unobservedtastes and that these tastes may also drive bloggingbehavior. For instance, we may suspect that unob-served factors like “popularity” that cause a spe-cific agent to receive more friendship requests alsocause him to post more content. This is a versionof the endogenous group formation problem arisingfrom “homophily,” the tendency of agents with simi-lar tastes to form social groups. Homophily is a per-vasive feature of social networks (e.g., see McPhersonet al. 2001) and has been shown to be empiricallyimportant in online social network data (Ansari et al.2011, Braun and Bonfrer 2011). One solution to theendogeneity induced by homophily is facilitated bythe availability of panel data. Assuming agent tastesare time invariant, one can control for time-invariantaspects that drive endogenous group formation viaagent fixed effects, �1i (e.g., Narayanan and Nair 2013).

In addition, one may be concerned that comove-ment in friends and blogs across users and time may begenerated by common, time-varying unobservables.For instance, agents may be significantly motivated torequest ties and post content related to surfing dur-ing periods particularly conducive to surfing activity(e.g., great weather) or when surfing is more salientthan usual in sports enthusiasts’ minds (e.g., a largesurf tournament occurred). These common shocks,when unobserved by the analyst, are a source of spu-rious correlation. The time period fixed effects, �1t ,control for these kinds of concerns.

Figure 1 Evidence of Positive Covariation of Blogging and Wind Speed

Number of blogs on location Number of days with average wind > 3 BFT

0 10 20 30 40 50 60 70 80 90

0 200 400 600100 km

Note. The plot shows total blogs written about a location (horizontal lines) and number of days where the wind at that location was greater than 3 BFT (verticallines).

Even after employing these controls, concerns maycontinue to persist about the identification of �1. First,there may be time-varying individual-specific unob-servables driving friendship requests that may be cor-related with blogging propensity. For instance, anunobserved (to the econometrician) local event featur-ing agent i’s recent windsurfing activity may cause aspecific subset of users to request friendship ties withthat user, and that event may also cause agent i toblog more on Soulrider.com. In this story, the sourceof spurious correlation is individual and time spe-cific, which is not picked up by fixed effects thatare common across agents for a given week or com-mon across weeks for a given agent. Second, bit maydirectly be a function of fit in an equation analogousto (1) that determines blogging activity. This situationcould arise, for instance, if the true data-generatingprocess is one in which some individuals post con-tent primarily as a way to obtain more friends onSoulrider.com. To address these concerns, we look fora source of variation in blogs (bit) that can be excludedfrom the equation determining fit and thereby serveas an instrument for bit .

Instruments for Blogs� Variation in Wind. Our strat-egy exploits differences in wind speeds across surflocations in Switzerland. We expect content to be pro-duced concomitant with actual surfing activity, whichis in turn driven by wind speeds at surf locations.If wind speeds affect blogging, we would expect tosee that blogs relating to a particular location are morelikely when wind speeds there are higher. We presentgeographic plots and statistical tests showing this istrue in our data.

Figure 1 plots the geographic distribution of blog-ging activity and wind. We plot with horizontal linesthe number of total blogs written about a particular

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 8: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content8 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

location (lake) and with vertical lines the number ofdays where the wind at that location was greater than3 BFT (left). The barplot (right) displays the sameinformation. Each bar represents 1 of the 19 surf-ing locations. The bars are sorted by the number ofsurfable days in descending order. The number ofblogs and surfable days at a specific location can beread from the x-axis. We see there is evidence of astrong correlation, implying that blogging responds towind speeds at the focal location (corr = 0�39, t = 3�90,p < 0�001). As a test, we compute the mean weeklywind speed at every location and create a low- andhigh-wind-speed group by splitting at the median.We test whether the mean (across weeks) of totalweekly blogging is statistically significantly differentbetween the low- and high-wind-speed groups. Themean number of weekly aggregate blogs in the low-mean-wind group is 12.99 and in the high-mean-windgroup is 20.47. Means differ significantly between thetwo groups (t = 3�23, p < 0�002).

To also see whether wind drives blogging at theindividual-week level, on the right-hand side of Fig-ure 2, we plot a median split of wind speed atthe preferred location at the individual-weekly leveland compare the per-user weekly blogging intensitybetween the two groups. We see individual bloggingis significantly higher in the high-wind-speed group(t = 9�25, p < 0�001). On the left-hand side of Figure 2,we also plot the mean (across weeks) of the aggregate

Figure 2 Blogging Incidence in Relation to Wind Speed

0

5

10

15

20

Num

ber

of b

logs

Above median wind Below median wind

Aggregate blogging

0

0.02

0.04

0.06

Num

ber

of b

logs

Above median wind Below median wind

Individual blogging

Notes. Left panel: Mean total weekly blogging split (across all user-weeks) at the median wind speed at preferred surf location. Right panel: Mean weeklyblogging per-user split (across all user-weeks) at the median wind speed at preferred surf location.

(across users) weekly blogging between the same twogroups. We see aggregate blogging about the pre-ferred location is also significantly higher in the high-wind-speed group (t = 3�23, p < 0�001). Finally, tocheck whether users who generate content about aparticular location are also those who have a largenumber of friends, in Figure 3 we plot blogging aris-ing from a user overlaid with his social connections.On the right of the figure, each line in the barplotrepresents one particular user of the social network.Users are sorted (in descending order) by their num-ber of social ties (black lines). The grey lines indicatethe number of blogs per user. The figure provides evi-dence for a robust correlation (corr = 0�41, t = 12�01,p < 0�001), as more blogs about a location are gener-ated by users who have more ties.

These plots evince patterns that suggest bloggingis linked to wind speeds and that individuals whoblog often also tend to be well connected. Our iden-tification strategy is tied to using wind speeds asexogenous shifters of blogging, which are excludedfrom the propensity to have friends. The assump-tion behind this argument is that users do not formfriendships at surfing locations per se, so a higherwind speed does not directly cause a user to havemore friends. We believe this assumption is not unrea-sonable based on our results from the survey andthe nature of the sport. Friending on the website isprimarily driven by prior offline ties and from seeing

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 9: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 9

Figure 3 Evidence of Positive Covariance of Blogging from a User with His or Her Network Ties

Users Blogs Social ties 0 20 40 60

Social ties (black lines)Blogs (grey lines)

0 10 20 30 40 50 60 70 80 90 100 km

Notes. Left: Geographic plot centered on each user. Right: Black lines represent users, sorted by number of ties; grey lines denote blogs per user.

profiles on the site, not by meeting on the beach,which is typically a short interaction (our survey sug-gests ties result from multiple/repeated interactionsand not one-time meetings). For instance, in Figure 3we do not see much evidence for geographic proxim-ity determining the distribution of ties, as would bethe case if users were primarily connecting with eachother following their visits to nearby surf locations.7

As a robustness check of the assumption, we in-vestigate whether average friendship requests peruser are higher during weekends—when users maybe more likely to socialize at the beach—than onweekdays.8 We find little difference (0.004 duringweekdays versus 0.003 on weekends). The raw cor-relation between blogs and friends at the day-userlevel is 0.04 during the weeks and 0.014 during week-ends (going in the other direction of the socializationstory—higher correlation during weekdays).

As a stronger test, we look for alternative model-free evidence that friendship formation does not occurin direct response to changes in wind conditions.If high wind makes surfers more likely to go to alake and more likely to meet future online friendsthere, we would expect to see more links formedbetween a user and those who live in closer geo-graphic proximity to his preferred surf location. Con-sider an agent i who receives friendship requests andthe set of agents j (1� � � � �number of agents− 1) who

7 High winds at a surfing location may make a user more interestedin friending locals to learn about the surfing conditions aroundthem (e.g., through email). We found that only 16 of 703 (2.3%)users restricted Soulrider.com’s email functionality to their friends.Hence, the majority of users could be reached by Soulrider.comemail without the need to link them as friends. As a result, anec-dotally, we do not believe that the need to communicate is a majordriver of friending.8 We thank an anonymous referee for this suggestion.

send the requests. Let D�i� j� be the distance from j’shome location to i’s preferred surf location. If windleads to friendship, the propensity of i to form a linkwith j should be a decreasing function of D�i� j�—ifthere is no such geographic dependency, it is unlikelythat friendship formation is directly influenced bywind conditions.

For each user pair (i� j) in our data, we computeD�i� j� as the driving distance between agent j’s homeand i’s preferred surf location in kilometers using theGoogle Maps application programming interface (wecompute 490,976 such pairs). We then run logit mod-els of the effect of D�i� j� on the propensity of all pairof users i and j to become friends on Soulrider.com.Following the survey evidence, we include proxyvariables for the chance that users know each otherfrom before in real life, as well as a variable mea-suring the salience of the online presence of thosethey know on Soulrider.com. To capture the first fac-tor, we compute the distance between agents’ physicalhomes—the assumption being that two agents wholive close to each other are more likely to know eachother offline (people who live close to one anotherare more likely to meet, regardless of wind condi-tions). To capture the second factor, we use the Weblogs from the site and create a variable measuringthe duration in seconds when both agents i and jwere online at the same time on Soulrider.com priorto their friendship on the site. If users are onlineat the same time, they are more likely to observeeach other because the site displays a visible list ofall users currently logged in (this is similar to othersocial networks like Facebook). Table 1 reports theseregressions.

Consistent with the survey, we find that users wholive closer to each other and those who were fre-quently online on Soulrider.com together are more

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 10: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content10 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

Table 1 Testing for the Mechanisms of Link Formation

Variable Parameter t-stat. Parameter t-stat.

D4i1 j5 4068E−04 0099 6057E−04 1038Distance between homes of −8094E−03 −16078 −9022E−03 −17005agents i and j

Duration of overlapping 6094E−06 6015website visits (sec)

Constant −4067 −113079 −4069 −112037Observations 490,976

Notes. The dependent variable equals 1 if users i and j are friends, 0 other-wise, estimated across all (i1 j) pairs in data. Two binary logit model resultsreported (with increasingly stringent controls). If high wind makes surfer imore likely to go to a lake and to meet future online friend j there, we expectPr(i1 j form a link) is decreasing in D4i1 j5, the distance between i ’s favoritesurf location and j ’s home location. Results above show D4i1 j5 is not a sig-nificant predictor of link formation, suggesting this is not the case. Resultsdo not change with a linear probability model, including fixed effects for i,for j , or for both i and j (to control for homophily).

likely to be friends, but we do not see any statisticallysignificant effect that users are more likely to formfriendships with those that live closer to their favoritesurf location (t-statD4i1 j5 = 1038). These effects do notchange in a linear probability model that includesfixed effects for i or for j or for both i and j (to con-trol for homophily). This suggests that meeting at thebeach is not first order in driving the link formation.Finally, to inspect these relationships visually, in Fig-ure 4 we plot the probability of users i and j forminga link on the y-axis against D4i1 j5 on the x-axis, con-ditional on the distance between users’ homes. Alsoincluded are the best fit lines. Consistent with theregression results, we do not see the negative relation-ship that we would have expected if people indeedform friendships with those who live closer to theirfavorite surf location. Given these findings, we thinkusing wind as an IV is not unreasonable.

To operationalize the instrument, for each user i,we compute a variable, wind_meanit , as the weeklymean of the wind speed at his most preferred surf-ing location.9 We also compute as wind_sdit the stan-dard deviation of the weekly wind speed at the samelocation. We hypothesize that blogging activity willbe greater, ceteris paribus, in weeks with high aver-age wind speed and high wind variability, becauseboth the salience and extent of relevant informationabout surfing conditions will be heightened duringsuch periods. Table 2 reports descriptive statistics ofthese two variables. We collect both in an instrumentvector, mit = 8wind_meanit1wind_sdit9.

3.1.1. Estimation. We can now estimate Equa-tion (1) by the generalized method of moments

9 Users typically frequent their most preferred surfing locations.Our data reveal that 98.7% of all reported blogs refer to the user’stop five surfing locations.

Table 2 Descriptive Statistics of Model Variables

Variable Obs. Mean Std. dev. Min Max

blogs 4b5 571040 00051 00293 0 7friends (f ) 571040 00026 00179 0 6prior_blogs 4B5 571040 40814 60536 0 61prior_friends 4F5 571040 90716 210007 0 192days_online 571040 9090137 5670310 0 2,535wind_mean 571040 20252 00444 1.500 4.250wind_sd 571040 00594 00283 0 4.010fof_mean 571040 00430 10093 0 18

(GMM) using mit as instruments for blogging. In esti-mation, we need to handle a concern that, includingindividual fixed effects (�1i), leads to an economet-ric endogeneity problem when the “stock” variablesBit (cumulative blogs) and Fit (cumulative friendrequests) are included, because the presence of stockvariables violates the “strict exogeneity” assumptionrequired for the estimation of panel data modelswith fixed effects—namely, that E6�1it Fi� 7 = 0 andE6�1it Bi� 7 = 0 for all t and � (see, e.g., Wooldridge2002). We follow Chamberlain (1992) and estimatethe model under the standard, weaker assumptionof sequential exogeneity: that �1i� is uncorrelatedwith current and prior values of the stock variables(E6�1it Fi� 7= E6�1it Bi� 7= 0, ∀ � ≤ t), which is a naturalfit with our model specification. Under the sequen-tial exogeneity assumption, two period or greater lagsof Fit are available as instruments for ãFit (as areany function of these variables), with similar rela-tions holding for Bit (Wooldridge 2002). Because alarge number of overidentifying restrictions can resultin an estimator with poor finite sample properties,we limit the set of instruments for ãBit and ãFit

to the vector qit = 8Bi1 t−21Fi1 t−29. After transformingby first differencing, the moments used for estima-tion become M1 =Q4ãfit − 4�1ãbit +�1ãBit + �1ãFit +

�1ãzit5), where Q is the vector 6m′it q

′it z

′it7

′, and indica-tor variables capturing the �1t parameters have beenabsorbed into zit . The estimated parameters minimizethe GMM objective function, M′

1�M1, where � is aweighting matrix. Following Hansen (1982), the opti-mal � is inversely proportional to the variance of themoments. We estimate � via the usual two-step pro-cedure, assuming independent moments in the firststep and using the first-step estimate to construct thesample analog of the optimal weighting matrix in thesecond step.

3.2. Effect of Tie Requests on Content GenerationWe now discuss identification of the reverse equation,the effect of friends on blogs. Estimation of this effectis subject to similar considerations as in the previ-ous case, requiring access to some exogenous vari-ation in friends that can be excluded from the blogs

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 11: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 11

Figure 4 Probability That Users i and j Form a Link Against D�i� j� by Different Distances Between Users’ Homes �x�, Where D�i� j� Is the DistanceBetween i’s Favorite Surf Location and j ’s Home Location

0.005

0.010

0.015

Pr(

link)

0 100 200 300 400

50 km > x

0.004

0.006

0.008

0 100 200 300 400

100 km > x ≥ 50 km

0.002

0.004

0 100 200 300 400

150 km > x ≥ 100 km

0.001

0.002

0.003

Pr(

link)

0 100 200 300 400 500

200 km > x ≥ 150 km

0

0.002

0 100 200 300 400 500

250 > x ≥ 200 km

0.0005

0.0010

0.0015

0 100 200 300 400 500

300 > x ≥ 250 km

0

0.001

Pr(

link)

0 100 200 300 400 500

x ≥ 300 km

x-axis in all plots is D (i, j )

y-axis in all plots is Pr(link)

equation for identification. We first present an identi-fication strategy using an IV assumption that deliverspoint identification of the effect of interest in combina-tion with a dynamic panel model analogous to Equa-tion (1). The IV assumption uses the number of friendrequests to an agent’s friends as an instrument forhis friend requests. This strategy uses the variation inthe network position of users for identification and isbroadly related to a recent literature that emphasizesthe role of group-size variation in social networks foridentification of endogenous social effects (e.g., Lee2007, Bramoullé et al. 2009, Oestreicher-Singer andSundararajan 2012). The IV assumption can, however,be violated if friends receive tie-formation requests ina way that is directly related to the focal agent’s blog-ging activity (e.g., many agents arrive on i’s webpageon Soulrider.com because he or she blogs a lot, andthose that arrive on i’s webpage then send tie requeststo i’s friends, thus causing those tie requests to bedirectly a function of i’s blogging). Although fixedeffects for i address this concern to some extent, itdoes not accommodate the time variation in the user’sfriendship formation. Subsequently, we also presentestimates from two approaches that utilize alternativemonotonicity assumptions and deliver bounds on thetreatment effects.

Our IV model uses friendship requests received bythe friends of each focal agent as instruments for

friends in a linear model analogous to Equation (1):

bit = �2i +�2t + �2fit + 2�it + �2� it + �2zit + 2it � (2)

To understand the friend-of-friend instruments, sup-pose there are only three agents: the focal agent iand agents j and l. Further suppose that i and jare connected (i.e., already friends), but l is not con-nected to either. Let �l→i� t be an indicator of whetheragent l sends a friendship request to connect with thefocal agent i in period t. Then the number of friendrequests i receives is fit = �l→i� t . From Equation (2),because fit affects bit , �l→i� t has a direct effect on bit .Now suppose l decides to send a friend request toi’s friend j ; i.e., �l→j� t = 1. Clearly, �l→j� t is correlatedwith the endogenous variable fit = �l→i� t because fac-tors that influenced l to send a request to i could alsoinfluence him to send a request to i’s friend j . How-ever, as �l→j� t is not a request to i, �l→j� t does not have adirect effect on i’s blogging, bit . Hence, �l→j� t serves asan instrument for fit . Essentially, we use the requestsreceived by i’s friends as instruments for the requestshe receives.

To operationalize the variable, we define fofit(friends of friends), collecting the set of friends ihas in �t�i� and then counting the number of friendrequests received by i’s friends in period t as fofit =∑

j∈�t �i�� k��t �j��k→j� t . We expect fof to be positively cor-

related with friends through common characteristics

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 12: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content12 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

that dissipate as a function of the network distance.For example, immediate friends presumably havesimilar levels of “gregariousness,” which in turn leadsto similar rates of friendship formation. Exclusion offof from the blogs equation is based on the premise thatuser i does not set his blogging output in responseto friendship requests received by anyone other thanhimself. Intuitively, we exploit variation in networkposition under the assumption that after controllingfor any common shocks (via agent and time fixedeffects), an agent’s content generation is likely drivenonly by the environment facing him, and not by hisfriends. As with the wind data, observations of fofare of daily frequency, whereas the panel frequencyis weekly. We operationalize the instrument as theweekly mean of each agent’s fof observations. That is,we compute fof_meanit = mean( fofid), where d indexesthe daily observations corresponding to week t. Sum-mary statistics of fof_mean are provided in Table 2.

3.2.1. Identification with Weaker Assumptions.If the IV assumptions above are violated, the strategyoutlined above will not deliver consistent estimates ofthe causal effect of blogs and friends on each other. Wenow consider what can be learned about the causaleffects under much weaker assumptions that do notassume exclusion restrictions. The trade-off associatedwith the weaker assumptions will be that we will beable to obtain only bounds and not point estimates ofthe effects.

We assume that the blogs is monotonic in friends andthat friends is monotonic in blogs. Following Manski(1997) and Manski and Pepper (2000), we combinethis monotone treatment response (MTR) assumptionwith a monotone treatment selection (MTS) assump-tion to present sharp bounds on the treatment effects.Subsequently, we utilize an alternative identificationstrategy using wind and fof as MIV in the sense ofManski and Pepper (2000). This latter strategy deliv-ers a separate set of bounds on the treatment effect.We find that our results are broadly consistent acrossthese approaches. We provide a short discussion ofthe bounding strategies below, pointing the reader toManski (1997) and Manski and Pepper (2000) for moredetails on these estimators.MTR + MTS. We provide a brief discussion of

bounding the effect of friends on blogs (the discus-sion holds analogously for obtaining the effect of blogson friends). Using the terminology of Manski (1997),blogs (b) is the outcome of interest while friends (f )is the treatment, and the response function for anindividual i is bi4f 5, which is also allowed to beheterogeneous across users. The goal of estimationis to recover Ɛ6bi4f 57, the mean response of blogs tofriends level f in the population. Knowledge of themean response is sufficient to determine the averagetreatment effect (ATE), the causal effect on blogging

of increasing friends from f1 to f2, ATE = ã4f11 f25 =

Ɛ6bi4f257− Ɛ6bi4f157.The data provide the joint distribution of realized

friend requests and blogging responses. We denotethe realized treatment levels (friendship requests) foragent i in the data as �i and the blogs generatedby i given �i as bi4�i5; 8�i1 bi4�i59 is observed in thedata. A conjectured treatment level is denoted f . Theblogging responses to the conjectured treatment level,bi4f 5, are latent outcomes that need to be inferredfrom the observed information in the data. Calculat-ing the ATE from the data is complicated by twoissues. First, observing 8�i1 bi4�i59 is insufficient to cal-culate Ɛ6bi4f 57 for a conjectured treatment level offriends f , because we do not observe bi4f 5 for agenti when f 6= �i.10 Second, the observed treatment lev-els �i are not randomly allocated across i0 We expectthose who tend to blog more to attract more friendrequests; hence, the observed treatment levels fi arenot statistically independent of response functions,bi4�i5. The conditional distribution of the observedblogging functions at the realized levels of friendshiprequests are therefore not informative of the responsefunctions at other levels of friendship requests. Thisselection on unobservables results in a selection biasproblem, which prevents us from calculating the ATEwithout further assumptions. This situation is the rootof the empirical identification problem.

We follow Manski (1997) and Manski and Pepper(2000) and use two intuitive restrictions on the blog-ging response functions (and analogously, friendingresponse functions). Later, we show these assump-tions are not rejected by our data:

Definition 1 (Monotone Treatment Response).For each i, f2 ≥ f1 ⇒ bi4f25≥ bi4f15.

Definition 2 (Monotone Treatment Selection).For each i, �2 ≥ �1 ⇒ Ɛ6bi4f 5 � f = �27≥ Ɛ6bi4f 5 � f = �17.

The MTR and MTS assumptions imply weak mono-tonicity on the response functions and average res-ponse respectively. Although both are consistent withthe statistical fact that “more popular agents blogmore,” both interpret this outcome in different ways.The MTR assumption says that all things held equal,blogging increases in the conjectured levels of friend-ship requests received by each agent. The MTSassumption asserts that agents who have higher num-bers of friendship requests have weakly higher meanblogging functions than those who receive fewerfriendship requests. Both are consistent with priorexpectations about user behavior here. Manski and

10 Although panel data on individuals are available, the presenceof feedback effects and lagged actions in the model determiningoutcomes implies that the user in the future is not comparable tothe user in the present.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 13: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 13

Pepper (2000) show that combining MTR and MTSassumptions yields the following bounds on the meanresponse:∑

�<f

Ɛ6bi � �= u7P4�= u5+ Ɛ6bi��i = f 7P4�i ≥ f 5≤ Ɛ6bi4f 57

≤∑

�>f

Ɛ6bi��= u7P4�= u5+ Ɛ6bi � �i = f 7P4�i ≤ f 50 (3)

These bounds are sharp in the sense that in theabsence of any additional assumptions, every pos-sible value within the upper and lower bounds inEquation (3) has an equal chance of being the meanresponse in the population. Note that nothing isassumed about the process of treatment selection, andno across-agent restrictions on the response of blog-ging to friends are imposed. Hence, these bounds arerobust to nonrandom selection and to functional form(and, in particular, they allow for nonlinear response).All terms in the bounds in Equation (3) can be esti-mated nonparametrically from the data.11

The bounds in (3) in turn imply bounds on the ATE,ã4f11 f25. By MTR, bi4f25 ≥ bi4f15 if f2 ≥ f1, implyingthe lower bound on the ATE is 0. The upper boundon the ATE is the upper bound on Ɛ6bi4f257 minus thelower bound on Ɛ6bi4f157:

ã4f11 f25

≤∑

u<f1

(

Ɛ6bi � f = f27− Ɛ6bi � f = u7)

P4f = u5

+(

Ɛ6bi � f = f27− Ɛ6bi � f = f17)

P4f1 ≤ f ≤ f25

+∑

u>f2

(

Ɛ6bi � f = u7− Ɛ6bi � f = f17)

P4f = u50 (4)

MTR + MIV. Finally, we also consider alternativebounds under an MIV assumption. The concern withthe IVs is that they could have a direct effect onoutcomes and thereby violate the exclusion condi-tions. For instance, we were worried that wind coulddirectly effect friends (and not just indirectly as an IVfor blogging) and that fof could directly affect blog-ging (and not just indirectly as an IV for friends).We now consider a class of weaker assumptions thatallows us to utilize the variation provided by windand fof, while accommodating the fact that the exclu-sion restrictions do not hold. Following Manski andPepper (2000), we assume that fof, denoted z, is anMIV for friends (analogously that wind is an MIVfor blogs):

Definition 3 (Monotone Instrumental Vari-able). Covariate z is an MIV if for all z2 ≥ z1 ⇒

Ɛ6bi4f 5 � z= z27≥ Ɛ6bi4f 5 � z= z17.

11 Bounds using either MTS or MTR alone, which are derived inManski and Pepper (2000), are wider and less informative.

We can contrast the MIV assumption with anassumption that z is an IV: that for all 4z21 z15,Ɛ6bi4f 5 � z = z27 = Ɛ6bi4f 5 � z = z17. Thus, to imposethat fof is an IV implies asserting an exclusion restric-tion that agents connected to those who obtain sim-ilar numbers of friendship requests ( fof) have thesame mean blogging propensity. This may be vio-lated, as previously explained. However, to imposethat fof is an MIV weakens the exclusion restric-tion to a monotonicity restriction that agents whoare connected to those who receive more friend-ship requests have weakly higher blogging propen-sity than agents connected to those who receive fewerfriendship requests. It is reasonable to expect that allthings being equal, agents tend to blog more (and notless) when they are connected to more friends. Sim-ilarly, to impose that wind is an MIV weakens therestriction that wind does not directly affect friends toa much weaker monotonicity restriction, that agentswho surf at windier locations receive more friendshiprequests than agents who surf at less windy locations.These MIV assumptions are thus fairly reasonablefor Soulrider.com and in line with what we expecta priori.

Following Manski and Pepper (2000), combiningMIV and MTR assumptions provides the followingsharp bounds on Ɛ6bi4f 57:12

z∈Z

P4v=z5

{

supu1≤z

[

Ɛ6bi �v=u11f ≥�i7P4f ≥�i �v=u15

+"P4f <�i �v=u15]

}

≤Ɛ6bi4f 57

≤∑

z∈Z

P4v=z5

{

infu2≥z

[

Ɛ6bi �v=u21f ≤�i7P4f ≤�i �v=u25

+"̄P4f >�i �v=u25]

}

0 (5)

Intuitively, the bounds impose restrictions on themean response implied by an MTR assumption, con-ditional on a given value of the MIV, and then findeither the maximum or the minimum of these restric-tions over all possible values taken by the MIV. As inthe previous case, all values of the lower and upperbounds are observed in the data, enabling us to esti-mate bounds on the conditional mean nonparamet-rically. Bounds on the treatment effect can then becomputed from the above bounds on the conditionalmean, as in Equation (4).

This concludes our discussion of identification. Inour results section, we report estimates for the modelswithout instruments and for treatment effects underthe IV assumption, the MTR + MTS assumption, andthe MIV + MTR assumptions.

12 In formula (5), " ("̄) is the minimum (maximum) of the realizedvalue of blogs (b).

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 14: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content14 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

4. Data and ResultsWe now discuss the data in more detail, beginningwith some stylized patterns. First, we describe keypatterns in the generation of content and the patternof social ties. Then, we check for interrelationships incontent generation and network structure.

Table 2 presents the descriptive statistics for theblogs and friends variables. On average, users postabout 0.05 blogs per week (maximum 7), and addabout 0.03 friends per week (maximum 6). There arealso a large number of user-weeks when no blogsare posted or no friends are added. An average userweek starts with about 5 prior blogs posted and about10 friend requests already received. We also add thevariable days_online to our specification, which countsthe number of days since the user registered on Soul-rider.com. We conjecture that this variable may helpinform the rate of content posting, because new usersmay be more likely to be more active than existingusers. From Table 2, the average user has been reg-istered on Soulrider.com for about 909 days. Figure 5presents histograms of the total content posting across

Table 3 Joint Distribution of friends and blogs

friends

blogs 0 1 2 3+ Total

0 53�744 1�039 70 10 54�8631 1�470 113 26 6 1�6152 377 43 8 2 4303 72 9 2 0 834+ 41 6 4 0 37

Total 55�704 1�208 110 18 57�040

Table 4 GMM Estimation of Linear Panel Model: friends(blogs)

OLS IV† FE IV† FE IV† FE IV†

blogs (b) 0�056∗∗∗ 0�142∗∗∗ 0�055∗∗∗ 0�154∗∗∗ 0�043∗∗∗ 0�124∗∗ 0�042∗∗∗ 0�151∗∗

�0�009� �0�033� �0�009� �0�042� �0�007� �0�041� �0�007� �0�055�prior_friends (F ) 0�001∗∗∗ 0�001∗∗∗ 0�002∗∗∗ 0�001∗∗∗ −0�003 −0�003 −0�003 −0�003

�0�000� �0�000� �0�000� �0�000� �0�006� �0�006� �0�006� �0�006�prior_blogs (B) −0�000∗∗∗ −0�001∗∗∗ −0�000∗∗∗ −0�001∗∗∗ 0�002 0�004∗ 0�003 0�004∗

�0�000� �0�000� �0�000� �0�000� �0�001� �0�002� �0�001� �0�002�days_online −0�000∗∗∗ −0�000∗∗∗ −0�000∗∗∗ −0�000∗∗ −0�000∗ −0�000∗∗ −0�001 0�002

�0�000� �0�000� �0�000� �0�000� �0�000� �0�000� �0�002� �0�003�fof_mean 0�029∗∗∗ 0�027∗∗∗ 0�028∗∗∗ 0�025∗∗∗ 0�025∗∗∗ 0�025∗∗∗ 0�025∗∗∗ 0�024∗∗∗

�0�003� �0�003� �0�003� �0�003� �0�003� �0�003� �0�003� �0�003�Constant 0�019∗∗∗ 0�016∗∗∗ 0�014∗∗∗ 0�013∗∗∗

�0�002� �0�002� �0�003� �0�003�

Overid �2 0�217 1�188 0�586 0�117Overid p-value 0�641 0�276 0�444 0�732Weak ID F -statistic 54�422 40�337 49�539 34�072Week fixed effects No No Yes Yes No No Yes YesIndividual fixed effects No No No No Yes Yes Yes Yes

Notes. Dependent variable is friends per week. Ordinary least squares, bi level fixed effects, and fixed effects IV models reported. Dynamic panel specificationestimated via GMM. Robust standard errors clustered at the user level reported in parentheses.

†Instruments: fof_mean, lagged prior_friends, lagged prior_blogs, days_online, wind_mean, wind_sd.∗p < 0�05; ∗∗p < 0�01; ∗∗∗p < 0�001.

Figure 5 Histograms of Total Blogs and Friends Across Users

0

20

40

60

Per

cent

0 10 20 30 40 50+

Total blogs per individual

users and the total friendship requests across users inthe data set. Figure 5 documents significant hetero-geneity across users in both their content-generationbehavior and in the extent to which they receiverequests to form online ties on Soulrider.com. Thecross-sectional correlation across individuals in totalblogging and total friends requests is positive and sig-nificant (�= 0�35).

The joint distribution of friends and blogs is sum-marized by the cross-tabulation in Table 3. There is alarge mass point at zero. The across-consumer, across-time correlation between the variables is positive andsignificant (�= 0�11).

4.1. Results

4.1.1. The Effect of blogs on friends. Table 4 pre-sents results from the estimation of the linear model

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 15: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 15

measuring the effect of blogs on friends, in whichwe also add past blogging, past friendship requestsreceived, and the number of days on the websiteas additional variables explaining friendship requestsreceived in the current period. For consistency withour first identification strategy for the reverse equa-tion, we also include the fof variable as a covari-ate in these regressions. Looking at the first columnof Table 4, we see there is preliminary evidence ofa feedback effect: The effect of blogs on friends isstrongly statistically significant. However, these esti-mates are likely upward biased because of the lackof control for homophily or correlated unobservables.The table also presents results including week fixedeffects to control for time-varying unobservables thatdrive blogging and friendship formation. We see thatweek fixed effects do not change the estimates much,suggesting that such common unobservables may notbe first order for these data. Nevertheless, the direc-tion of the change in the estimate is consistent withour intuition: Once we control for this spurious sourceof correlation, we expect the parameter on blogs todecrease in magnitude. Referring to Table 4, we seethis is indeed the case.

We now discuss the results from adding individ-ual fixed effects into the previous specification. Notethat this specification is very demanding of the data,as the inclusion of both individual and week fixedeffects implies that all variation common to individu-als within a given week, as well as variation commonto weeks for a given individual, are fully controlledfor and are not used to inform the causal effects. Thefixed effects control for homophily and are expected

Table 5 GMM Estimation of Linear Panel Model: blogs(friends)

OLS IV† FE IV† FE IV† FE IV†

friends (f ) 00162∗∗∗ 00776∗∗∗ 00150∗∗∗ 00632∗∗∗ 00087∗∗∗ 00393∗∗∗ 00095∗∗∗ 00347∗∗

4000225 4001305 4000215 4001355 4000155 4001135 4000155 4001145prior_friends (F ) 00000 −00002∗∗∗ 00000 −00001∗∗ −00020∗ −00015 00004 00006

4000015 4000015 4000015 4000015 4000085 4000085 4000075 4000065prior_blogs (B) 00003∗∗∗ 00003∗∗∗ 00003∗∗∗ 00003∗∗∗ −00019∗∗∗ −00019∗∗∗ −00015∗∗∗ −00015∗∗∗

4000005 4000005 4000005 4000005 4000045 4000055 4000045 4000045days_online −00000∗∗∗ −00000∗∗∗ −00000∗∗∗ −00000∗∗∗ 00000∗∗∗ 00000∗∗∗ −00035∗∗∗ −00034∗∗∗

4000005 4000005 4000005 4000005 4000005 4000005 4000065 4000065wind_mean 00045∗∗∗ 00040∗∗∗ 00037∗∗∗ 00035∗∗∗ 00048∗∗∗ 00045∗∗∗ 00057∗∗∗ 00054∗∗∗

4000075 4000075 4000115 4000105 4000065 4000065 4000105 4000095wind_sd 00034∗∗∗ 00031∗∗∗ 00048∗∗∗ 00041∗∗∗ 00038∗∗∗ 00039∗∗∗ 00040∗∗∗ 00038∗∗∗

4000105 4000085 4000115 4000105 4000095 4000085 4000105 4000105Constant −00076∗∗∗ −00077∗∗∗ −00097∗∗∗ −00092∗∗∗

4000145 4000135 4000195 4000195

Weak ID F -statistic 1020239 900073 650235 610381Week fixed effects No No Yes Yes No No Yes YesIndividual fixed effects No No No No Yes Yes Yes Yes

Notes. Dependent variable is blogs per week. Ordinary least squares, bilevel fixed effects, and fixed effects IV models reported. Dynamic panel specificationestimated via GMM. Robust standard errors clustered at the user level reported in parentheses.

†Instruments: fof_mean, lagged prior_friends, lagged prior_blogs, days_online, wind_mean, wind_sd.∗p < 0005; ∗∗p < 0001; ∗∗∗p < 00001.

to correct an upward bias in the estimation of thecausal effects. Looking at the last column of Table 4,we see that this is indeed the case. Looking at thefirst and fifth columns, we see that the effect of blogson friends has dropped from 0.056 to 0.042 whenadding user fixed effects (a 25% decrease). We alsosee that individual fixed effects control for a largesource of unobserved within-user persistence in thedata. For testing statistical significance, note that alltables report robust standard errors that have beenclustered at the user level.

Table 4 also presents estimates of the modelwhen instrumenting for blogs using the wind speedvariables. We first discuss whether these instrumentsare working correctly. Note that, given the simultane-ous equations setup, the first stage for the IV in thisequation is essentially the reverse regression, the effectof friends on blogs, which is presented in Table 5. There,we see the wind variables are significant in explainingblogging. Then we discuss whether we are subject toa weak instruments problem. To test for weak instru-ments, we report the Kleibergen and Paap (2006) rk-statistic, which is a generalization of the weak IV testto the case of non-i.i.d. (independent and identicallydistributed) errors. The null is that the instruments areweak, and a rough thumb rule for empirical work isthat there is no weak instruments problem if the rk-statistic is >10. Looking at the columns in Table 4,we see that this is the case: the weak identificationstatistics are all greater than 30. Furthermore, we seethat the overidentifying restrictions for the instru-ments are not rejected in any of the models, and thefit is good. Overall, these diagnostics indicate that theinstruments are working properly.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 16: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content16 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

We also see that the fof variable is significantand relevant in driving friendship requests, which isimportant for the identification of the reverse equa-tion. Furthermore, the stock variables, past blogging,past friendship requests, and days on Soulrider.comare significant in the ordinary least squares regres-sions, but they are not significant once time andindividual-specific fixed effects are included. Thisresult persists when we use instruments.

Finally, we see that the magnitude of the coeffi-cient on blogs in the friends equation has increasedafter instrumenting. We found this result surprising,as, a priori, we expected the bias to go in the otherdirection. However, this result is plausible and canbe explained by stories where the unobservables arenegatively correlated with the endogenous variable.For instance, we conjecture that unobservables thatdrive friendship formation could proxy for extrover-sion, friendliness, or windsurfing skill (more userswant to be friends with better windsurfers, all thingsheld equal). If extroverts post more blogs, the unob-servables would be positively correlated with blogs,and we would observe an upward bias. If, on theother hand, extroverts tend to spend more time offlineand post fewer blogs, then unobservables would benegatively correlated with blogging and we wouldobserve a downward bias (as we see here). We donot take a strong stance on what these unobservablesrepresent, because either story seems reasonable.

4.1.2. The Effect of friends on blogs. Table 5 pre-sents the results from linear panel data models ofthe effect of friends on blogs. The dependent variablein the regression is blogs. Analogous to the estima-tion of the previous model, we add past blogging,past friendship requests received, and the number ofdays on the website as controls. Consistent with thesimultaneous equation framework in which we usedwind speeds as instruments for blogs in the reverseequation, we include the wind speed instruments as

Table 6 Bounds on the Effect of friends on blogs, as Well as the Average Treatment Effect Using MIV,MTR, and MTS Assumptions; fof (Friends of Friends) Used as an MIV

Bounds on Ɛ6blogs � friends7

MIV-MTR MTS-MTR Upper bound on ATE, ã4f11 f25

friends Lower Upper Lower Upper f1 f2 MIV-MTR MTS-MTR

0 00046 00051 00047 00051 0 1 50747 001534000015 4000015 4000015 4000015 4000175 4000165

1 00050 50793 00050 00199 1 2 50925 005284000015 4000175 4000015 4000165 4000065 4000885

2 00052 50976 00051 00578 2 3+ 50944 005154000015 4000065 4000015 4000885 4000025 4001745

3+ 00052 50995 00051 005664000015 4000025 4000015 4001745

Note. Standard errors in parentheses, based on 1,000 bootstrap replications.

covariates in these regressions. We also report esti-mates from the IV strategy in which we use friendsof friends (fof_mean) as an IV for friendship requests.Results with and without user and week fixed effectsand with and without instrumenting are reported.

From Table 5 we find that the friendship requestshave a statistically significant effect on blogging. Mir-roring the pattern of results from the regressions inTable 4, we find that controls for individual andweek fixed effects tend to reduce the effect of friend-ship requests, and instrumenting increases the mag-nitude. We find some evidence of negative statedependence in blogging, with new blogs decliningover time for those who have been on the websitefor a long time and those who have already posted alot of content. We also find that the wind speed vari-ables are strongly statistically significant in explainingblogging. The propensity to post content is increasingin the mean wind speed at the user’s preferred surf-ing locations (presumably reflecting increased surfingactivity) and in the variation in the wind speeds atthe preferred surf location (i.e., the fact that the windis too low to facilitate surfing also has informationalvalue and may result in posts). Table 5 also reportsdiagnostics on weak instruments for the friends-of-friends instrument, which indicates no concern of aweak instruments problem. As this equation is justidentified, we do not report tests of overidentifyingrestrictions.

4.1.3. Bounds Estimates: MIV, MTR, and MTSStrategies. We now present bounds estimates of theeffects under the MIV, MTR, and MTS assumptionsdiscussed previously. Compared to the linear IVpreviously, these estimates are nonparametric andimpose no functional form restrictions on the natureof the relationship between friends and blogs. The factthat the treatment variables are discrete facilitatesnonparametric estimation using a bin estimator forthe lower and upper bounds (i.e., we do not needto do any smoothing). In Table 6 we report bounds

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 17: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 17

Table 7 Bounds on the Effect of blogs on friends, as Well as the Average Treatment Effect Using MIV,MTR, and MTS Assumptions; wind_mean (Wind Speed) Used as an MIV

Bounds on Ɛ6friends � blogs7

MIV-MTR MTS-MTR Upper bound on ATE, ã4b11 b25

blogs Lower Upper Lower Upper b1 b2 MIV-MTR MTS-MTR

0 00022 00025 00022 00026 0 1 40846 000914000015 4000015 4000015 4000015 4004305 4000095

1 00025 40868 00026 00113 1 2 50927 001244000015 4004305 4000015 4000095 4002995 4000215

2 00026 50952 00026 00150 2 3 60337 001394000015 4002995 4000015 4000215 4002645 4000445

3 00026 60363 00026 00165 3 4+ 60709 002264000015 4002645 4000015 4000445 4001555 4000915

4+ 00026 60735 00026 002524000015 4001555 4000015 4000915

Note. Standard errors in parentheses, based on 1,000 bootstrap replications.

on the average treatment effect of friends on blogscombining MIV and MTR assumptions in one strat-egy and combining MTR and MTS assumptions inanother (bounds from only one proved to be widerthan those reported here, as expected). In the firstfour panels of Table 6, we report the mean responseof blogging to various levels of the treatment, friends.For instance, the first row reports on Ɛ6bi4f 5 � f = 07.Upper and lower bounds on Ɛ6bi4f 57 for each levelof f are presented under the two monotonicityassumptions. Standard errors for the upper and lowerbounds, which are computed using a nonparametricpanel bootstrapping procedure, are reported in paren-theses. The bounds intervals are estimated precisely.Furthermore, we see the mean response is increasingin the treatment levels, consistent with the monotonic-ity assumption. This is also a test for whether themonotonicity conditions hold; the table indicates thedata do not reject monotonicity.

An intuitive way to interpret the intervals is tocompute the ATE, reported in the right panel of thetable. As mentioned before, the lower bound on theATE is zero under the MTR assumption. Therefore,the implied upper bounds are presented. We see thatthe MIV-MTR assumptions imply wide intervals onthe ATE: for instance, increasing friends from 0 to 1would on average increase blogging anywhere from 0to 5.747 posts per week. These bounds are relativelyuninformative, because any value in this range has anequal chance of being the ATE.13 However, we see thatthe MTR-MTS assumptions have substantial bite in

13 The wide bounds in our application are a result of the high degreeof skew in the conditional distribution of the treatment (f ) at alllevels of the instrument (z). The large point-mass at the zero friendstreatment level that implies that P4f > �i � v5 ≈ 1 for �i > 0 at alllevels of the MIV, which in turn implies that the upper bound inEquation (5) will approximately equal the maximal response value.

bounding the treatment effect. Under these assump-tions, the upper bound on the average effect on blog-ging of increasing friends from 0 to 1 is 0.153 per week,which is significantly smaller and much more infor-mative. Given this, our preferred estimator is the onethat uses the MTR-MTS assumption. Comparing toTable 5, we see the effects reported for the marginaleffect of friends under fixed effects panel IV models(0.347) is in the range of the bounds implied by MTR-MTS (and the wider less-informative bounds impliedby MIV-MTR).

In Table 7 we report bounds on the average treat-ment effect of blogs on friends, combining MIV andMTR assumptions in one strategy as before and com-bining MTR and MTS assumptions in another. We usewind as an MIV for blogging. In the first four panelsof Table 7, we report the mean response of friend-ing to various levels of the treatment, blogs, as well asthe average treatment effect from changing blogs. Wesee that the MIV-MTR assumptions imply wide inter-vals on the ATE, as in the previous case, but that theMTR-MTS assumptions are effective in informativelybounding the treatment effect. Thus, our preferredestimator for this side of the equation is also the onethat uses the MTR-MTS assumption. As before, we seethat monotonicity is not rejected. Again, comparing toTable 4, we see the effects reported for the marginaleffect of blogs under fixed effects panel IV mod-els (0.151) is in the range of the bounds implied byMTR-MTS (as well the wider, less-informative boundsimplied by MIV-MTR).

These results, obtained across several alternativeassumptions and identification strategies, suggest thatour estimates are reasonably robust. To interpret ourestimates, we can roughly convert into revenue termsusing some back-of-the-envelope arithmetic. As withmost social networking sites, Soulrider.com’s primaryrevenue stream is from online advertising, and thus

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 18: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content18 Management Science, Articles in Advance, pp. 1–19, © 2013 INFORMS

incremental revenue from any intervention is realizedby generating additional page views, which in turndepend on the amount of user-generated content onthe site. To estimate a predictive relationship betweenblog production and page impressions, we can regressthe blogs variable on the number of page impressionson the user’s webpage (t-statistics in parentheses):

PIit = ai +Xbit = ai + 3680044540245

bit0

Implicitly, we assume in this regression that aspectsof the user’s page “quality” that affect page-views arecontrolled for via the fixed effect ai. Because Soul-rider.com receives approximately 0.002 CHF per pageimpression, we obtain that revenue is related blog-ging as R4b5 = 00002 × 36800 × b = 00736 × b (note, 1CHF ≈ 1 USD). Thus, an incremental blog is worthabout 0.736 CHF on average to Soulrider.com. If onetakes feedback effects into account, however, weshould also incorporate the fact that, on the margin,incremental blogging results in incremental friending,which in turns leads to incremental blogging, andso on, till feedback settles. Using the linear IV esti-mates, we can compute (by solving (1) and (2) for thereduced form for b) that such local feedback effectscontribute about an additional 5.5% revenue on aver-age across users. There is also significant variationacross users.14

5. ConclusionsThis paper adds to an emerging literature relevantto social networking website content managementstrategies. We empirically document that ties can helpfacilitate content generation on a site, thereby creat-ing a link between social tie formation and adver-tising revenues, and generating local network effectsbetween content and ties. We use detailed data froman online social network to conduct our empiricalanalysis. The data enable us to control for severalconfounds that have typically plagued empirical anal-ysis of online social interactions. We discuss whatassumptions are required in empirical strategies thatseek to identify causal effects in this setting. We out-line two approaches, one based on exclusion restric-tions and the other on monotonicity assumptions. Ifexclusions or monotonicity are violated, our estimatesshould be interpreted as correlational and not causal.We present some evidence in support of the exclusionassumptions, but some aspects remain untestable. Weshow that monotonicity is not rejected. We find evi-dence consistent with network effects and describethe implications for website revenue.

14 A prior working version of this paper reported these results,which are available from the authors upon request.

Our analysis is related to the recent interest in mar-keting to gain understanding of the role of socialinteractions using data from online social networks.An empirical challenge in this endeavor is that thenetwork structure is endogenous to the actions takenby agents. Augmenting the model of an agent’sactions on the network with a model for the networkstructure requires solving a formidable network for-mation game. An alternative approach to this prob-lem adopted here is to conduct inference with anincomplete model of network formation under weakassumptions that deliver informative bounds on thecausal effects of interest. This approach seems promis-ing for empirical analysis of a wide variety of socialand economic networks.

Several extensions of the current work are possi-ble. Data on page views can help map out a richerpicture of the strength of ties between users. Data onad exposures can help sharpen the link between con-tent and advertising revenues. Also interesting wouldbe further research on understanding the mecha-nisms and moderators of social influence on the site(e.g., Zhang (2010) notes the differing implications formarketing of observational learning versus informa-tion sharing, and Du and Kamakura (2011) note thatsocial influence is moderated by spatial and temporalcontiguity). An important contribution would be todevelop a better understanding of the utility of usersfrom content creation and tie formation and to endog-enize the network structure within a tractable empir-ical model. We leave these extensions to future work.

AcknowledgmentsThe authors contributed equally and are listed in reverse-alphabetical order. The authors thank J. P. Ferguson,Chris Forman, Wagner Kamakura, Carl Mela, Song Yao,the review team, and seminar participants at the FuquaSchool of Business, Georgia Tech, Stanford University,University of Rochester, and at the European MarketingAcademy (2011), Marketing Dynamics (2011), MarketingScience (2010, 2011), Net Institute (2010) and the Quan-titative Marketing and Economics (2011) conferences foruseful comments. The authors thank the NET Institute(http://www.NETinst.org) for supporting this research.

ReferencesAhn D-Y, Duan JA, Mela C (2011) An dynamic equilibrium model

of user generated content. Working paper, Duke University,Durham, NC.

Albuquerque P, Pavlidis P, Chatow U, Chen K-Y, Jamal Z (2012)Evaluating promotional activities in an online two-sided mar-ket of user-generated content. Marketing Sci. 31(3):406–432.

Ansari A, Koenigsberg O, Stahl F (2011) Modeling multiple rela-tionships in social networks. J. Marketing Res. 48(4):713–728.

Bramoullé Y, Djebbari H, Fortin B (2009) Identification of peereffects through social networks. J. Econometrics 150(1):41–55.

Braun M, Bonfrer A (2011) Scalable inference of customer similari-ties from interactions data using dirichlet processes. MarketingSci. 30(3):513–531.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Page 19: Columbia Business School - Social Ties and User-Generated … · 2013-02-01 · Shriver, Nair, and Hofstetter: Social Ties and User-Generated Content 2 Management Science, Articles

Shriver, Nair, and Hofstetter: Social Ties and User-Generated ContentManagement Science, Articles in Advance, pp. 1–19, © 2013 INFORMS 19

Brothers L, Hollan J, Nielsen J, Stornetta S, Abney S, Furnas G,Littman M (1992) Supporting informal communication viaephemeral interest groups. Proc. 1992 ACM Conf. Comput.-Supported Cooperative Work (ACM, New York), 84–90.

Bughin JR (2007) How companies can make the most of user-generated content. McKinsey Quart. (3):1–4.

Chamberlain G (1992) Comment: Sequential moment restrictions inpanel data. J. Bus. Econom. Statist. 10(1):20–26.

Chevalier J, Mayzlin D (2006) The effect of word of mouth online:Online book reviews. J. Marketing Res. 43(3):345–354.

Chintagunta P, Venkataraman S, Gopinath S (2010) The effects ofonline user reviews on movie box office performance: Account-ing for sequential rollout and aggregation across local markets.Marketing Sci. 29(5):944–957.

Ciliberto F, Tamer E (2009) Market structure and multiple equilibriain airline markets. Econometrica 77(6):1791–1828.

Corbin JM, Strauss AL (2008) Basics of Qualitative Research: Tech-niques and Procedures for Developing Grounded Theory (Sage Pub-lications, Thousand Oaks, CA).

Dellarocas C (2006) Strategic manipulation of Internet opinionforums: Implications for consumers and firms. Management Sci.52(10):1577–1593.

Dhar V, Chang E (2009) Does chatter matter? The impact ofuser-generated content on music sales. J. Interactive Marketing23(4):300–307.

Du R, Kamakura W (2011) Measuring contagion in the diffusion ofconsumer packaged goods. J. Marketing Res. 48(1):28–47.

Duan W, Gu B, Whinston AB (2008) Do online reviews matter? Anempirical investigation of panel data. Decision Support Systems45(4):1007–1016.

Friese S (2011) Using ATLAS.ti for analyzing the financial crisisdata. Forum: Qualitative Soc. Res. 12(1):Article 39.

Ghose A, Han SP (2011) An empirical analysis of user content gen-eration and usage behavior on the mobile Internet. ManagementSci. 57(9):1671–1691.

Godes D, Mayzlin D, Chen Y, Das S, Dellarocas C, Pfeiffer B,Libai B, Sen S, Shi M, Verlegh P (2005) The firm’s managementof social interactions. Marketing Lett. 16(3):415–428.

Goldenberg J, Han S, Lehmann D, Hong JW (2009) The role of hubsin the adoption processes. J. Marketing 73(2):1–13.

Haile PA, Tamer E (2003) Inference with an incomplete model ofenglish auctions. J. Political Econom. 111(1):1–51.

Hansen LP (1982) Large sample properties of generalized methodof moments estimators. Econometrica 50(4):1029–1054.

Hartmann WR, Manchanda P, Nair H, Bothner M, Dodds P,Godes D, Hosanagar K, Tucker C (2008) Modeling social inter-actions: Identification, empirical methods and policy implica-tions. Marketing Lett. 19(3):287–304.

Hennig-Thurau T, Gwinner KP, Walsh G, Gremler DD (2004) Elec-tronic word-of-mouth via consumer-opinion platforms: Whatmotivates consumers to articulate themselves on the Internet?J. Interactive Marketing 18(1):38–52.

Jackson MO (2008) Social and Economic Networks (Princeton Univer-sity Press, Princeton, NJ).

Katona Z, Sarvary M (2009) Network formation and the structure ofthe commercial World Wide Web. Marketing Sci. 27(5):764–778.

Katona Z, Zubcsek P, Sarvary M (2011) Network effects and per-sonal influences: Diffusion of an online social network. J. Mar-keting Res. 48(3):425–443.

Kleibergen F, Paap R (2006) Generalized reduced rank testsusing the singular value decomposition. J. Econometrics 133(1):97–126.

Krippendorff K (2004) Content Analysis: An Introduction to ItsMethodology (Sage Publications, Thousand Oaks, CA).

Kumar V (2011) Why do consumers contribute to connected goods?A dynamic game of competition and cooperation in social net-works. Working paper, Harvard Business School, Boston.

Lee L (2007) Identification and estimation of econometric modelswith group interactions, contextual factors and fixed effects.J. Econometrics 140(2):333–374.

Lento T, Welser HT, Gu L, Smith M (2006) The ties that blog: Exam-ining the relationship between social ties and continued partic-ipation in the Wallop weblogging system. WWW.Third AnnualWorkshop on the Weblogging Ecosystem, Edinburgh, Scotland.

Manski CF (1997) Monotone treatment response. Econometrica65(6):1311–1334.

Manski CF (2000) Economic analysis of social interactions.J. Econom. Perspect. 14(3):115–136.

Manski CF, Pepper JV (2000) Monotone instrumental variables:With an application to the returns to schooling. Econometrica68(4):997–1010.

Mayzlin D, Yoganarasimhan H (2012) Link to success: Howblogs build an audience by promoting rivals. Management Sci.58(9):1651–1668.

McPherson M, Smith-Lovin L, Cook JM (2001) Birds of afeather: Homophily in social networks. Annual Rev. Sociol.27(27):415–444.

Moffitt RA (2001) Policy interventions, low-level equilibria, andsocial interactions. Durlauf SN, Young HP, eds. Social Dynamics(MIT Press, Cambridge, MA), 45–82.

Narayan V, Yang S (2007) Modeling the formation of dyadic rela-tionships between consumers in online communities. Workingpaper, Cornell University, Ithaca, New York. http://ssrn.com/abstract=1027982.

Narayanan S, Nair H (2013) Estimating causal installed-base effects:A bias-correction approach. J. Marketing Res. Forthcoming.

Nardi BA, Schiano DJ, Gumbrecht M, Swartz L (2004) Why weblog. Commun. ACM 47(12):41–46.

Nov O (2007) What motivates Wikipedians? Commun. ACM50(11):60–64.

Ochoa X, Duval E (2008) Quantitative analysis of user-generatedcontent on the Web. Proc. First Internat. Workshop on Under-standing Web Evolution (WebEvolve2008), Beijing, 19–26.

Oestreicher-Singer G, Sundararajan A (2012) The visible hand?Effects of recommendation networks in electronic markets.Management Sci. 58(11):1963–1981.

Stephen AT, Toubia O (2010) Deriving value from social commercenetworks. J. Marketing Res. 47(2):215–228.

Trusov M, Bucklin RE, Pauwels K (2009) Effects of word-of-mouthversus traditional marketing: Findings from an Internet socialnetworking site. J. Marketing 73(5):90–102.

Wooldridge J (2002) Econometric Analysis of Cross Section and PanelData (MIT Press, Cambridge, MA).

Yoganarasimhan H (2012) Impact of social network structure oncontent propagation: A study using YouTube data. Quant. Mar-keting Econom. 10(1):111–150.

Zhang J (2010) The sound of silence: Observational learning in theU.S. kidney market. Marketing Sci. 29(2):315–335.

Zhang K, Sarvary M (2011) Social media competition: Differenti-ation with user-generated content. Working paper, INSEAD,Fontainebleau, France.

Zhang X, Zhu F (2011) Group size and incentives to contribute:A natural experiment at Chinese Wikipedia. Amer. Econom. Rev.101(4):1601–1615.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.