bayesiansocialinﬂuenceintheonlinerealm - arxiv · bayesiansocialinﬂuenceintheonlinerealm...

Bayesian Social Influence in the Online RealmPrzemyslaw A. Grabowicza, Francisco Romero-Ferrerob,c, Theo Linsd, Fabrício Benevenutoe, Krishna P. Gummadif, andGonzalo G. de Polaviejab,c

aThe College of Information and Computer Sciences, University of Massachusetts, 01003, Amherst, USA; bCajal Institute, Ave. Doctor Arce 37, 28002 Madrid, Spain;cChampalimaud Centre for the Unknown, Avenida Brasília, 1400-038 Lisbon, Portugal; dComputer Department of Federal University of Ouro Preto, Campus UniversitárioMorro do Cruzeiro, 35400-000, Ouro Preto, Brazil; eComputer Science Department of Federal University at Minas Gerais, Av. Antônio Carlos, 6627, Pampulha 31270-010, BeloHorizonte, Brazil; fThe Max Planck Institute for Software Systems, Campus E 1 5, 66123 Saarbrucken, Germany

This manuscript was compiled on February 27, 2020

Our opinions, which things we like or dislike, depend on the opin-ions of those around us. Nowadays, we are influenced by the opin-ions of online strangers, expressed in comments and ratings ononline platforms. Here, we perform novel “academic A/B testing”experiments with over 2,500 participants to measure the extent ofthat influence. In our experiments, the participants watch and eval-uate videos on mirror proxies of YouTube and Vimeo. We controlthe comments and ratings that are shown underneath each of thesevideos. Our study shows that from 5% up to 40% of subjects adoptthe majority opinion of strangers expressed in the comments. Us-ing Bayes’ theorem, we derive a flexible and interpretable family ofmodels of social influence, in which each individual forms posterioropinions stochastically following a logit model. The variants of ourmixture model that maximize Akaike information criterion representtwo sub-populations, i.e., non-influenceable and influenceable indi-viduals. The prior opinions of the non-influenceable individuals arestrongly correlated with the external opinions and have low standarderror, whereas the prior opinions of influenceable individuals havehigh standard error and become correlated with the external opin-ions due to social influence. Our findings suggest that opinions arerandom variables updated via Bayes’ rule whose standard deviationis correlated with opinion influenceability. Based on these findings,we discuss how to hinder opinion manipulation and misinformationdiffusion in the online realm.

social influence | opinion manipulation | misinformation | social media |Bayesian

In the previous century, experts heavily influenced thepublic opinion via mass media (1, 2). Nowadays, billions ofindividuals express their opinions in the online realm throughonline comments, reviews, and evaluations (3). Unfortunately,it is relatively easy and cheap to fake such online opinions,compared with physical world (4, 5).∗ In the recent years, so-called astroturfers were hired to proliferate selected opinionsand fake news online during major societal events, includingdemocratic elections (6, 7). To design systems that are robustto misinformation and manipulation, it is crucial to uncoverthe extent and the mechanism of social influence.

The challenge in experimental studies of social influence isthat traditional lab experiments (8, 9) and online surveys (10)lack external validity. Field experiments, on the other hand,yield less control over confounding factors, hindering causalinference (11–14). To address these experimental challenges,we conduct novel academic A/B testing experiments. Wecreate “clones” of existing social media websites, i.e., mirrorproxies of real websites that are fully controlled by researchersto perform randomized experiments. These mirror proxieshave exactly the same look and basic functionalities as their

∗There exist several websites selling comments, thumbs up, and views in social media, for instancehttps://buysocialmediamarketing.com and https://www.qqtube.com. Last checked in April 2018.

real counterparts. In our experiments, the participants watchand evaluate videos on YouTube and Vimeo, i.e., popularvideo-sharing platforms, in their private spaces and comfortzones. Underneath each of the videos, we show different typesof social feedback, including the comments and the countersfor views, likes, and dislikes. In our experimental conditions,we randomly modify this social feedback by suppressing someof the comments and lowering the counters. Then, we surveythe participants about their opinions on each of the videos, tofind whether the social feedback they are exposed to influencespublic opinion. The participants are unaware that this socialfeedback is modified, but they are informed that the goal ofthe experiments is to survey their opinions. Among others, wequantify the extent of social influence of online strangers onthe opinions of participants and test whether the comments orthe counters exert more influence. Thanks to our novel A/Btesting setup, these measurements are precise and yield highexternal validity.

The results of social influence experiments are typicallyexplained with the socio-psychological theories of informativeand normative social influence (15–17). The former theorystates that social influence stems from our need for accurateinformation about real world and that this influence is fa-cilitated by the objective perceptual uncertainty about thestimulus. The theory of normative influence explains socialimpact with the need to be accepted by others, arising when amutual relationship between individuals is present. However,social influence is observed even if these conditions are notmet (18). Indeed, the participants of our experiments watchand evaluate videos that do not exhibit objective perceptualuncertainty and they are exposed to social signals from on-line strangers. The more recent theory of self-categorizationaddresses these points and explains the results of seminal ex-periments on social influence (8, 9) by introducing subjectiveuncertainty and arguing that social signals interfere with thatuncertainty in a way that informative and normative needsare inseparable (18). Human judgments under uncertaintyhave been extensively studied in psychology (19–22). Pioneer-ing works in this area measure human biases in probabilisticreasoning by means of comparisons with Bayesian inference,in particular the binomial model (19, 22–24). Here, we derivethe corresponding binomial model of social influence from ba-sic principles of probability theory, following empirical Bayesmethod. In our settings, priors and posterior distributions

Author contributions: P.A.G., F.R.-F., F.B., K.P.G, and G.G.d.P. designed experiments and research;P.A.G. and T.L. implemented experiments; P.A.G, F.R.-F., and T.L. performed research; P.A.G andF.R.-F. analyzed data; P.A.G, F.R.-F., and G.G.d.P. contributed analytic methods; P.A.G, F.R.-F., andG.G.d.P. wrote the paper.

The authors declare no conflicts of interest.

2To whom correspondence should be addressed. E-mail: [email protected]

arX

iv:1

512.

0077

0v2

[ph

ysic

s.so

c-ph

] 2

6 Fe

b 20

20

https://buysocialmediamarketing.com

https://www.qqtube.com

Table 1. Summary of the experiments.

Experiment I Experiment II

Platforms: YouTube, Vimeo YouTubeParticipants: 1,116 1,391Videos: 8 14Experimental conditions: 11 7Expressed opinions: 8,928 9,737Opinion scale: 5-point Likert 200-point bipolarComment tracking: No Yes

are unknown and estimated from observations. With thismodel, we measure social influenceability and its relation touncertainty in online social media and introduce a Bayesiantheory of social influence.

Results

In total, over 2,500 subjects participated in our experiments(Table 1). The participants were recruited and compensatedvia Amazon Mechanical Turk (25). Each participant of ouronline experiments is instructed, in one sitting, to watch severalvideos, familiarize themself with social feedback to these videos,and answer whether they like the video. Each web page,showing a video with corresponding user comments, is a cloneof an existing web page on YouTube or Vimeo. The videos areon a variety of topics, ranging from pranks and commercialads to societal issues and innovations. We selected videos thathad up to a million views at the moment of data gathering,but not more, to avoid potential confounding effects fromparticipants who have seen them before. Albeit the videoswere pre-selected, many participants of our experiments foundthem to be entertaining. Over 185 participants used words“great”, “fun”, or “enjoy” in reference to the videos and theexperiment in an optional text-box comment presented at theend of each experiment (see Demographics and Feedback ofParticipants in SI Appendix).

Our goal is to measure how social feedback influences opin-ion. Each video comes with two types of social feedback: i) thecomments of its prior viewers and ii) the counters for views,thumbs up, and thumbs down (see Figure 1). In the exper-imental conditions, we control and modify these two typesof social feedback. In the negative experimental conditions,we hide positive comments and lower the numbers of viewsand thumbs up; whereas in the positive conditions, we makethe modifications that are exactly opposite. Overall, in theexperiments there are three main negative conditions andthree main positive conditions, differing in the degree of mod-ifications to social feedback. As the control condition we takethe respective video with its original unmodified commentsand the values of counters.

Then, we compute the probability of positive opinion aboutany video, P+, as the fraction of positive answers across usersand videos in the given experimental condition. This probabil-ity increases monotonously with the extent of modification ofsocial feedback, ordered from the most negative to the mostpositive main experimental condition (Figure 2A). In otherwords, the more positive comments, thumbs up, and viewsa participant sees under a video, the more likely they are tohave a positive opinion about that video. The 95% confidenceintervals show that the difference in the probability of posi-

counters

comments

modified

positive

negative

positive

Fig. 1. An illustration of the experiments. The comments are labeled as positive ornegative towards the video before the experiments. Under experimental conditions,some of the comments are randomly suppressed and the counters for views and forthumbs up and down are modified.

tive opinion between experimental conditions is statisticallysignificant for most of the condition pairs. Comparisons of thedistributions of raw responses confirm this result. For instance,there is a statistically significant difference in opinions betweenthe control condition and the strongly positive and negativeconditions (Mann-Whitney U test, p < 10−17 and p < 10−4,respectively). We conclude that the opinions are influencedby social feedback both positively and negatively (14).

In addition to opinion, we also survey the participantsabout their willingness to share the video they watched withfriends. The willingness to share a video is correlated withthe opinion about that video, because positive opinion aboutan object creates incentives for sharing it with friends (26).We find that all presented results are qualitatively and oftenquantitatively the same for the opinion and the sharing will-ingness, under different psychometric scales (see Experiment Iand Experiment II in SI Appendix).

So far we have shown the result for the main experimentalconditions, in which both the comments and the counters aremodified. However, it is not clear whether the participantsare influenced by the comments or the counters. To answerthis question, in Experiment I, we measure which type ofsocial feedback exerts more influence on the opinions: the com-ments or the counters? To this end, additional experimentalconditions are introduced, in which either only comments oronly counters are modified, i.e., two positive and two negativepartial experimental conditions.

In contrast to main positive and negative conditions, thepartial conditions modify only one type of social feedback,instead of both of them. We find that the probability ofpositive opinion is influenced more by the modifications ofcomments than the counters (compare the diamonds of thesame color in Figure 2B). In other words, the comments have

Str

ongl

yn

eg.

Mild

lyn

eg.

Wea

kly

neg

.

Ori

gin

al

Wea

kly

pos

.

Mild

lyp

os.

Str

ongl

yp

os.

Experimental condition

0.36

0.42

0.48

0.54

P+

Experiments I+II

Bot

hn

eg.

Com

men

tsn

eg.

Cou

nte

rsn

eg.

Ori

gin

al

Cou

nte

rsp

os.

Com

men

tsp

os.

Bot

hp

os.


0.30

0.35

0.40

0.45

0.50

P+

Experiment I

a b

Fig. 2. The probability of a positive opinion under the control condition (circle) and the experimental conditions of various strength (squares and stars). The color of markerscorresponds to positive (blue) and negative (red) conditions. This result is averaged over all videos and experiments. The 95% confidence intervals are from BCA bootstrap. (A)Main experimental condition. (B) Partial experimental conditions in which either comments or counters (diamonds) or both (stars) are modified.

larger impact on opinion than the thumbs and views. Thecomments significantly influence the opinion in negative andpositive experimental conditions with respect to the originalcondition (p = 0.0002 and p = 0.02, respectively), whereas theinfluence of thumbs is insignificant.

In the remainder, we explain the mechanism of social influ-ence by analyzing in more detail the influence of commentson public opinion. In Experiment II, to better understand themechanism of social influence, we make more precise measure-ments for more videos and participants, while tracking whichcomments were read by each of the subjects.† We exploitthis information to measure how the probability of a positiveopinion about a video, P+, depends on the difference, ∆n, inthe number of positive and negative comments read by a par-ticipant (Figure 3). Subjects have a relatively positive opinionabout a given video when positive comments prevail amongthe comments that they read (∆n >> 0), and a relativelynegative opinion if negative comments prevail (∆n << 0). Foreach of the videos, the probability of positive opinion P+(∆n)saturates at a lower value when ∆n is very negative and at ahigher value when ∆n is very positive. For most videos, theprobability P+(∆n) has a sigmoid shape and is anti-symmetric,exposing a systematic dependence. Next, we use a Bayesiantheory and models of social influence, to explain these experi-mental results and to estimate the percentage of individualsinfluenced by the comments.‡

Prior works proposed models of social influence that are ac-curate at predicting opinions in specific circumstances (12–14).In this study, we explain the mechanism of social influencewith a generic Bayesian theory. This theory posits that opin-ions are hypotheses whose probabilities of being correct areupdated in the process of social influence, following Bayes’theorem. To this end, we assume that social signals serve asevidence validating corresponding opinions. This assumptionis equivalent to the basic assumption of the self-categorizationtheory that agreement with others “subjectively validates our

†We track which exact comments are shown on the screen of each participant.‡We release our dataset to scientific community at www.linktodata.

responses [i.e., opinions] as veridical reflections of the externalworld” (18). To make a direct correspondence to Bayesianestimation, we distinguish between prior and posterior opin-ions, i.e., latent parameters representing the opinions beforeand after social influence, respectively. In particular, usingthis Bayesian theory, we derive the posterior probability of anindividual to express a positive opinion, P+, as a simple logitmodel, which accounts for the prior opinion of the individualand the opinions expressed by others (27, 28). On the groundsof this generic theory, other models could be proposed as well,but the logit model is particularly simple and sufficiently ex-pressive. The results of our Experiment II are explained bya mixture of logit models, corresponding to the mixture ofparticipants. We explore many variants of this mixture model,which differ in the number of components, using Akaike infor-mation criterion. The best models reveal interesting commonproperties.

The logit model is derived as a result of Bayesian esti-mation under social feedback only when, instead of purelyrational decisions (29, 30), individuals respond with an addi-tional stochastic component known as probability matching, asshown across animal species, including humans (10, 27, 28, 31)(see Derivation of Logit Model in SI Appendix). Underthis model, the posterior probability of a positive opinionis P+(∆n, s, a) = 1/ (1 + exp(−s∆n− a)). Here, a is the la-tent parameter representing the prior opinion of an individual,which corresponds to personal memories and thoughts on agiven subject, and s is their influenceability, which describeshow strongly influenced they are by the social feedback on thatsubject. This model predicts that the posterior probabilityof expressing a positive opinion in a homogenous population,where all individuals have the same value of parameters aand s, tends to saturate at 0 and 1 for ∆n << 0 and ∆n >> 0,respectively. This phenomena is not observed in our experi-mental data, likely because each individual reacts differentlyto social influence on a given topic. To include this hetero-geneity in the model, we treat the overall population as amixture of heterogeneous individuals, that is, a mixture of

www.linktodata

Majoritynegative

comments

Majoritypositive

comments-40 0 40

∆ n

0

0.25

0.5

0.75

1

P+

A

Video: ski lift jump

-40 0 40∆ n

0

0.25

0.5

P+

B

Video: ufo sighting

-40 0 400

0.25

0.5

P+

C

Video: shark prank

-40 0 40

0.5

0.75

1P+

D

Video: cut the fat talk

-40 0 400

0.25

0.5

P+

E

Video: pony shoes

-40 0 400

0.25

0.5

0.75

1

P+

F

Video: rollers trick

-40 0 400

0.25

0.5

0.75

1

P+

G

Video: cat bath

-40 0 400

0.25

0.5

0.75

1

P+

H

Video: feeding croc

-40 0 400

0.25

0.5

0.75

1

P+

I

Video: veet add

-40 0 400.5

0.75

1

P+

J

Video: google car

-40 0 400

0.25

0.5

P+

K

Video: baby yoga

-40 0 400.5

0.75

1

P+

L

Video: all nighter

-40 0 400.5

0.75

1

P+

M

Video: skywalk

-40 0 40

0.5

0.75

1

P+

N

Video: haribo add

Experiment II

Fig. 3. Public opinion about a video depends on majority opinion in comments. The probability of positive opinion is plotted versus the difference in the number of positive andnegative comments read by a subject. Magenta points correspond to the moving average over 100 measurements. The model applied to the finite real data expects the runningaverage in the gray area marking 99% confidence interval. The black line is the model applied to an infinite data.

logit models. Each component of the mixture correspond to asub-population of individuals. A sub-population k is charac-terized by the fraction pk of individuals belonging to it, theirprior opinion ak, and their influenceability sk. However, we donot know how many sub-populations there exist and whethertheir parameters differ between videos or are necessary forexplaining the data. To answer these questions, we explorethousands of variants of the sub-population model, differingin the number of sub-populations and parameters. First, weconsider variants having from one (K = 1) to six (K = 6)sub-populations. Second, each of the parameters, ak,sk, andpk, either depends on the video, is constant across videos, orvanishes due to replacement by a neutral constant. We fit eachunique variant of this model to the data by maximizing thelikelihood of our observations. To obtain the model that bestexplains our data, we rank these models by Akaike informationcriterion (32).

We then analyze what the best models share in com-mon. The top four models have two sub-populations: anon-influenceable sub-population with s1 = 0 and an influ-enceable sub-population with the influenceability s2 > 0, bothof which are constant across videos (see Model Selection in SIAppendix). The top model fits the data remarkably well (Fig-ure 3). It has s2 = 0.8±0.005 across videos, whereas the otherparameters, namely p1, a1, and a2, depend on videos. Thefraction of influenced individuals varies from p2 = 0.05± 0.05to p2 = 0.40 ± 0.05, depending on the video (see The Pa-rameters of the Best Model in SI Appendix), which meansthat a large portion of individuals is influenced by the com-ments they read. Possibly, the participants are influencedby the comments, because they agree with them. To testthis hypothesis, we measure whether the membership in theinfluenceable sub-population is related to the agreement withcomments, self-reported by each participant for each video.

A B C D E F G H I J K L M NVideo

10−1

100

101

102σ

(a2)/σ

(a1)∗√

p 2/p

1

jackknife bootstrap Fisher inf. average

Fig. 4. The ratio of standard error of prior opinion of the influenceable and non-influenceable individuals. Each black horizontal line is an average over three methodsof estimating standard error: as the inverse of observed Fisher information (triangles)or via jackknife resampling (circles) or bootstrap resampling (squares).

The membership of a sample in a sub-population generallydepends on the likelihood that this sample was generated bythat sub-population, i.e., P+(∆n, sk, ak). This likelihood issignificantly correlated with the agreement with comments(Spearman’s ρ = 0.41, p < 10−99). We conclude that socialfeedback tends to influence the opinion of subjects who agreewith the exposed opinions, although it can arise without aconscious agreement as well.

Apart from the difference in influenceability, there are othernotable distinctions between the two sub-populations. First,the prior opinion of the influenceable sub-population has asignificantly larger standard error in comparison with thenon-influenceable sub-population. After correcting for thedifference in sizes of the two sub-populations, with the factor√p2/p1, we find that the standard error of prior opinion of

an influenceable individual is from 2 to 31 times larger thanof a non-influenceable individual (depending on the video, asshown in Figure 4)A. Prior work shows that the standard errorof a perceived variable is closely and inversely related to theconfidence of human decisions depending on that variable (21).Thus, our result suggests that influenceable individuals aremore uncertain in their prior opinions than non-influenceableindividuals, making them more influenceable. This result isin line with the expectations of the self-categorization theory.Second, we find that the prior opinions of non-influenceablesub-population, a1, are strongly correlated with the exter-nal opinions (p < 10−4, the leftmost bar in Figure 5)B. Incontrast, the prior opinions of influenceable sub-population,a2, are not correlated with the external opinions (the graybar in Figure 5)B and become correlated only after being so-cially influenced (the two middle bars in Figure 5)B. In otherwords, social influence helps to develop weak opinions withhigh standard error into stronger opinions with lower standarderror that are closer to the external opinions. However, un-der strongly modified experimental conditions, the posterioropinions of influenceable sub-population become heavily dis-torted and further from the external opinions than their prioropinions (the rightmost bar in Figure 5)B. This result showsthat the influenceable individuals are vulnerable to opinionmanipulation.

Discussion

Our findings provide empirical support for the probabilisticnature of opinion formation in three different ways. First, in

0

0.2

0.4

0.6

0.8

1

Spe

arm

an's

cor

rela

tion

w

ith th

e ex

tern

al o

pini

on

Influenceable sub-population

Aftercontrol

condition

Weaklymodified

conditions

Mildlymodified

conditions

Stronglymodified

conditions

Prior opinion of

non-influenceablesub-population

Prior opinion

Fig. 5. The average correlation between the external opinion about videos and theprior and posterior opinions of either non-influenceable or influenceable individuals. Asthe external opinion we take the fraction of thumbs up for the original video (reportedin Table S2 in SI Appendix). As the socially influenced, i.e., posterior, opinions wetake the prior opinion on videos plus the effect of social influence, i.e., av + s2∆n,where ∆n is a sample for a random individual for the given video in the experimentalconditions of the given kind. The correlation is computed over videos and averagedover the samples.

the Bayesian theory of social influence, the prior opinion isupdated due to social feedback via Bayes’ rule, giving theposterior distribution of opinion (29, 30). The view of so-cial influence, as a social validation of opinions, is consistentwith self-categorization theory. Additionally, the theory ofself-categorization states that the influence is greater among in-dividuals who share salient identities. The question of whetherthere is a natural counterpart of this mechanism in Bayesianstatistics is open for future research. Second, the logit modelexplains our experimental observations only if we apply as thedecision rule so-called probability matching, which means thatthe expressed opinions are drawn from the posterior distribu-tion of opinion (27, 28). Third, our findings suggest that theprior opinion is also a random variable whose variance is cor-related with influenceability. One can interpret the standarderror of prior opinion as its standard deviation. Then, theprior opinions with larger variance tend to be more influence-able, whereas the prior opinions with lower variance are lessinfluenceable. On the grounds of Bayesian statistics, it is ex-pected that weak priors are affected more by observations thanstrong priors, when forming posteriors, in agreement with ourmeasurements and the prediction of self-categorization theorythat the uncertainty facilitates influence. Note, however, thatwhile this third point is naturally explained on the grounds ofBayesian statistics, the logit model captures it only indirectlythrough the components of the mixture. In future works, thevariance of prior opinions can be modeled with hierarchicalBayesian models and estimated with empirical Bayes meth-ods. Our Bayesian theory of social influence provides a basefor the development of such alternative models. The theoryposits that social influence is a Bayesian updating of prioropinions due to observed social feedback, possibly as a part ofgeneric distributed Bayesian inference about the structure ofreality (33, 34).

Opinion manipulation and misinformation are particularlypervasive (6, 7) and nearly effortless in online platforms, whichnowadays have billions of users and become crucial for thestability of society (3, 35). Further research is indispensableto prevent the abuse of social computing systems and to form

a more robust society. Bayesian social influence allows in-dividuals with weak prior opinions to form more informedopinions, by the virtue of expert influence on a given topic.However, these individuals are also vulnerable to opinion ma-nipulation, for instance via astroturfing, i.e., paid campaignscreated to influence individuals without their awareness. Inother words, there are both good and bad effects of socialinfluence. Our measurements of social influence yield highexternal validity, because our A/B testing experiments areconducted on clones of existing social media websites. Ourfindings suggest that randomized experiments in conjunctionwith statistical modeling of social influence can be used todetect vulnerable users and protect them from opinion manip-ulation and misinformation. Note that online platforms, suchas YouTube and Facebook, routinely perform such randomizedexperiments for other commercial purposes. Social computingsystems could be designed to emphasize the good influenceand hinder the bad influence by detecting and protecting vul-nerable users, and by estimating and exposing the expertiseof its users within topical domains. Although we anticipatethat both domain vulnerability and expertise are related toinfluenceability within that domain, we point out that sub-systems for measuring vulnerability and expertise shall evolveover years through an open scientific process, because of theirimportance to society. Nowadays, online ratings and com-ments are simple and heavily affected by sampling bias, i.e., apiece of content is judged by a biased sub-sample of populationand there is no way to see how other sub-populations wouldjudge that content. Future social computing systems couldcharacterize the people who evaluate a given piece of content,correct for the sampling bias in ratings and comments, andprovide information about how experts evaluate that content.These systems would hinder opinion manipulation and thediffusion of fake news, by informing users, the vulnerable onesin particular, about the nature of ratings and comments theysee.

Methods

Comments underneath videos. Before the experiments,from one (Experiment I) to three (Experiment II) editorslabel each of the comments as either positive, negative, orneutral towards the respective video. There is a significantagreement between the three labelers (Fleiss’ kappa of 0.56).The comments are always shown to participants in the reversechronological order of their original creation date, reflectingthe default setting in the respective video-sharing platformsat the time when the experiments were performed.

Experimental conditions. Each participant watches in arandomized order the same set of videos randomly assigned tothe experimental or control conditions. In the case when thetotal number of experimental conditions is different than thenumber of videos, we perform a round-robin over experimentalconditions to ensure a balanced assignment of conditions tovideos across participants.

Psychometric scales. To take robust and precise measure-ments, we use different psychometric scales for surveying opin-ions in the two experiments. The participants express theiropinions about a video by declaring their agreement with thestatement “I like this video”. In Experiment I, the partici-pants respond to this question on a standard 5-point Likert

scale, ranging from “Strongly disagree” to “Strongly agree”.In Experiment II, we use a more precise 200-point scale thatranges from 100% “Disagree” to 100% “Agree” (36). For thesake of simplicity, in the analysis we treat all “Agree” answersas positive opinion and all “Disagree” answers as negativeopinion, independently whether they correspond to “Weaklyagree”, “Strongly Agree”, “5% Agree”, or “75% Agree”.

The binomial model with decisions copying hypothe-ses. The derivation of the model follows the steps of Perez-Escudero et al. (27), however, its new framing makes a moreexplicit connection to Bayesian inference, by formalizing pos-terior predictive probability and likelihood, and is adapted tothe setting of opinion formation. The derivation makes a seriesof simplifying assumptions, but each of them can be relaxed,as we demonstrate in the following subsections introducingother models based on the same Bayesian principles.

We consider an action of liking or disliking a video as areflection of hypotheses, θ, considered by the focal individual,referred to as ego as a distinction from other individuals. Underthe binomial model, we assume that ego considers the minimalnumber of only two hypotheses, e.g., the video is good (θ+)or bad (θ−). Ego estimates their posterior probability of eachhypothesis, or posterior opinion, using their prior informationabout these hypotheses, P (θ), and the relevant observed socialsignals, B, with which they update their prior probabilityand obtain the posterior probability of hypotheses, P (θ|B),following Bayes’ rule

P (θ|B) = P (B|θ)P (θ)P (B) . [1]

Since only two hypotheses are considered, it is useful to writeBayes’ theorem in its posterior-odds form, that is

P (θ+|B)P (θ−|B) = P (B|θ+)P (θ+)

P (B|θ−)P (θ−) . [2]

Next, we assume that ego estimates P (B|θ) by naively assum-ing that the observed opinions are independent of each other.This assumption has been shown to be a good approximationof the model including dependencies for animals (27). Underthis assumption P (B|θ) = Z

∏N

i=1 P (bi|θ), where B is the setof N comments read by ego and bi is the opinion expressedin the comment i. Z is a normalization constant ensuring∑

BP (B|θ) = 1, also know as partition function, which is a

combinatorial term counting the number of possible commentsequences for the set of comments B. As in the design of ourexperiments, we assume that each comment can be catego-rized as positive (b+), negative (b−), or neutral (b=), totalingN = n+ + n− + n= comments. Then,

P (B|θ) = ZP (b+|θ)n+P (b−|θ)n−P (b=|θ)n= . [3]

We assume that neutral comments do not add any informationabout the correctness of hypotheses, P (b=|θ−) = P (b=|θ+),and we will neglect them in the reminder for simplicity. In-putting the last two formulas to Equation 2 gives

P (θ+|B)P (θ−|B) = P (b+|θ+)n+P (b−|θ+)n−P (θ+)

P (b+|θ−)n+P (b−|θ−)n−P (θ−) . [4]

Note that P (θ−|B) = 1− P (θ+|B) and the logarithm of thisequation gives the log-odds

log P (θ+|B)1− P (θ+|B) = n+s+ + n−s− + a, [5]

where s+ = log P (b+|θ+)P (b+|θ−) , s− = log 1−P (b+|θ+)

1−P (b+|θ−) , and a =log P (θ+)

P (θ−) . The parameter a captures the relative prior prob-ability of the two hypotheses, i.e., the relative prior opinionof ego,§ whereas s determines how much is the prior opinionaffected by the observed social signals. This formula for log-odds is further simplified, if we assume a symmetric influenceof positive and negative comments, i.e., if s+ = −s− = s,then a positive comment negates a negative comment. Werecognize that the log-odds in Equation 5 are a linear functionof the observed n+ and n−, so its parameters s, and a can beestimated with a logistic regression model

P (θ+|B) = σ(s∆n+ a), [6]

where n = n+ − n− and σ is a logistic function, but we stillneed to relate this posterior probability of hypotheses to theopinions expressed by ego.

So far we have considered the perceptual stage of decision-making, in which ego estimates which of the hypotheses iscorrect. Whether the video is liked or not is decided by adecision rule. Evidence for animals and humans suggests thatindividuals use a decision rule called probability matching(27, 28, 37–39). According to this rule, the ego expresses anopinion, b, by directly drawing the corresponding hypothesisfrom the posterior distribution of hypotheses, i.e.,

P (b = b+|B) = P (θ+|B). [7]

This decision rule is equivalent to the typical rule that futureobservations are draws from the posterior predictive distribu-tion, that is the likelihood averaged over the posterior

P (b = b+|B) =∑

θ∈{θ+,θ−}

P (b = b+|θ)P (θ|B), [8]

if only there is one-to-one mapping between b and θ andsamples of b from the likelihood are copies of the draws fromthe corresponding posterior distribution of θ, i.e., P (b|θ) = 1 ifb = b+ and θ = θ+ or b = b− and θ = θ−; otherwise P (b|θ) =0. Thus, there is a direct mapping between hypotheses andexpressed opinions; the difference between the two is thatexpressed opinions are drawn at random at the moment ofan observation, whereas hypotheses are latent opinions thatare not observed directly until they are copied and expressed.Finally, note that when ego is deciding what opinion to express(Equation 7), the likelihood function copies a draw from theposterior of ego; however, when the individuals whose opinionsare observed by the ego are making a decision (Equation 3),then the likelihood function copies a draw from their owndistributions over hypotheses, which ego aims to estimate withthe parameter s.

Model fitting. The parameters of a mixture of logit modelscan be inferred with the expectation-maximization algorithm,but this approach gets stuck in local optima (44). Thus,we use a different, approximate, method for inferring theparameters of each model. Namely, we treat each individualas indistinguishable and estimate the probability of positiveopinion as P+(∆n) =

∑K

k=1 pkP+(∆n, sk, ak), where pk is thefraction of individuals in the sub-population k and K is thetotal number of sub-populations. This probability does not

§ In our terminology, prior and posterior opinions are synonyms of prior and posterior distributionsover hypotheses, whereas expressed opinions are samples from these distributions.

depend on any particular individual, but instead it averagesthe probability of positive opinion over all individuals.

The parameters of a mixture of logit models can be in-ferred with the expectation-maximization algorithm, but thisapproach gets trapped in local optima (44). Thus, here weuse a mean-field approximation of that model to find optimalvalues of parameters. Namely, we assume that individuals areindistinguishable. In such case, the probability that an uniden-tifiable individual from the whole population has a positiveopinion about the video is

P+(∆n) =K∑k=1

pk1

1 + exp(−sk∆n− ak) , [9]

where pk is the portion of individuals in sub-population k.The joint probability of observing opinions yyy, given that theindividuals were exposed to ∆∆∆n comments is

L =U∏u=1

V∏v=1

(P+ (∆nuv))yuv (1− P+ (∆nu))1−yuv [10]

where U is the total number of individuals and V is thetotal number of videos, and

∑kpkv = 1 for each video v.

To fit the parameters of this model, we maximize the log-likelihood log(L). We present detailed results of this fitting inthe following subsections.

Standard errors of parameters. The standard error ofeach estimated parameter are obtained using three differentmethods. The first two methods correspond to random re-sampling of results among individuals: either via bootstrap-ping or jackknife approach. In the bootstrapping approach, wesample the results of experiment with replacements to obtainthe same number of samples, that is individuals, as in theoriginal experiment. Then, we fit the parameters of the modelusing such re-sampled data. We repeat this procedure 1000times and compute the standard error of the estimated pa-rameters. The jackknife approach follows the same procedure,except that instead of sampling, we randomly drop one samplefrom the set of original results of the experiment. The thirdmethod of computing standard error is based on the analysisof the log-likelihood. We note that the covariance matrix ofthe estimated parameters θ̂ is an inverse of the observed Fisherinformation (negated Hessian of log-likelihood):

ΣΣΣ(θ̂) =[III(θ̂)

]−1 =[−∂

2 (log (P (∆n∆n∆n,yyy|θ)))∂θi∂θj

∣∣∣∣θ=θ̂

]−1

. [11]

Thus, the standard error of an estimated parameter θ̂k is thesquare root of the corresponding diagonal element of ΣΣΣ(θ̂).We compute the standard error for each parameter using thismethod and report it in the main text.

Some of the parameters differ considerably between videos,especially the prior opinions ak about the videos. Interest-ingly, the prior opinion of the non-influenceable sub-populationabout a given video takes similar values in various top mod-els, whereas the prior opinion of influenceable sub-populationvaries largely across top models. Also, the standard error ofthe prior opinions is many times larger for the influenceablethan non-influenceable sub-population. This difference mayarise due to the fact that the non-influenceable sub-populationis larger than influenceable sub-population. If we interpret

the prior opinion ak of sub-population k as a mean over prioropinions aku of individuals belonging to this sub-population,then the standard error of this mean is σ(ak) = σ(aku)/

√Uk,

where Uk is the number of individuals belonging to that sub-population. Thus, to compare the standard errors of thisparameter for the two sub-populations, we shall compute theratio

σ(a2u)σ(a1u) = σ(a2)

√U2

σ(a1)√U1

=σ(a2)√p2

σ(a1)√p1, [12]

where pk is the fraction of individuals belonging to the sub-population k. This ratio compares the intrinsic standarddeviations of prior opinion of individuals belonging to twodifferent sub-populations, taking into account that they differin size.

Confidence intervals of the running probability. Forthe purpose of the presentation, we compute running proba-bility of positive opinion about a video. Namely, given a setof answers y from different users for specific ∆n, we computerunning probability of positive opinion with a sliding windowof n data points. To compute this running probability, the setof answers y is ordered in the increasing order of ∆n. Then,for every i-th window of n experimental points we compute∆ni = 1

n

∑n

u=1 ∆nu+i−1 and P+,i,j = 1n

∑n

u=1 yu+i−1. Inour dataset, for a given value of ∆n usually there are severalanswers from different participants for which ∆n is the same.Note that the answers for a given ∆n do not have a naturalorder. To avoid any artifacts in the computation of the run-ning probability due to the lack of order in the answers fora given ∆n, we randomly permute the answers for that ∆nand compute P+,i,j , where j stands for that permutation. Werepeat this process m times to obtain the final running proba-bility of positive opinion as an average over m permutations,i.e., P+,i = 1

m

∑m

j=1 P+,i,j .To show how well the model predicts the experimental

probability, we compute the 99% confidence interval of themodel for the running probability. To this end, we calculatethe running probabilities based on the artificial data simulatedwith the fitted model for the real finite set of ∆n. We repeatthis procedure 1000 times to obtain 99% confidence intervals ofthe model for each ∆ni. We present these confidence intervalsin Figure 1 of the main text and all other figures of runningprobability.

ACKNOWLEDGMENTS. We thank Luis Fernandez Lafuerza forinsightful discussions and feedback, Karin Chellew for her sugges-tions on the design of the experiments, Maria Cano-Colino for herhelp with comment labeling, and Manuel Gomez-Rodriguez forhis comments on model learning. We acknowledge funding fromthe Volkswagen Foundation (P.A.G.), the Alexander von Hum-boldt Foundation (F.B.), the Fundação de Amparo à Pesquisa deMinas Gerais (F.B.), the Fundação para a Ciência e TecnologiaPTDC/NEU-SCC/0948/2014 (G.G.d.P.), and the ChampalimaudFoundation (G.G.d.P.).

1. Lippmann W (1922) Public opinion. (Harcourt, Brace and Company, New York).2. Herman ES, Chomsky N (1988) Manufacturing Consent: The Political Economy of the Mass

Media. (Pantheon Books, New York).3. Richtel M (2013) There’s Power in All Those User Reviews. New York Times.4. King G, Pan J, Roberts ME (2016) How the Chinese Government Fabricates Social Media

Posts for Strategic Distraction, not Engaged Argument. American Political Science Review111(917):484–501.

5. Cho CH, Martens ML, Kim H, Rodrigue M (2011) Astroturfing Global Warming: It Isn’t AlwaysGreener on the Other Side of the Fence. Journal of Business Ethics 104(4):571–587.

6. Lazer DMJ, et al. (2018) The science of fake news. Science 359(6380):1094–1096.7. Vosoughi S, Roy D, Aral S (2018) The spread of true and false news online. Science

359(6380):1146–1151.

8. Asch SE (1955) Opinions and Social Pressure. Scientific American 193(5):31–35.9. Sherif M (1935) A study of some social factors in perception. Archives of Psychology

(Columbia University) 187:60.10. Eguíluz VM, Masuda N, Fernández-Gracia J (2015) Bayesian Decision Making in Human

Collectives with Binary Choices. PLOS ONE 10(4):e0121332.11. Muchnik L, Aral S, Taylor SJ (2013) Social influence bias: a randomized experiment. Science

(New York, N.Y.) 341(6146):647–51.12. Wang T, Wang D, Wang F (2014) Quantifying herding effects in crowd wisdom in Proceedings

of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining- KDD ’14. pp. 1087–1096.

13. Sipos R, Ghosh A, Joachims T (2014) Was This Review Helpful to You?: It Depends! Contextand Voting Patterns in Online Content in Proceedings of the 23rd International Conferenceon World Wide Web, WWW ’14. (ACM, New York, NY, USA), pp. 337–348.

14. Kramer ADI, Guillory JE, Hancock JT (2014) Experimental evidence of massive-scale emo-tional contagion through social networks. Proceedings of the National Academy of Sciences111(24):8788–8790.

15. Kelman HC (1961) Processes of Opinion Change. Public Opinion Quarterly 25(1):57–78.16. Wood W (2000) Attitude Change: Persuasion and Social Influence. Annual Review of Psy-

chology 51(1):539–570.17. Cialdini RB, Goldstein NJ (2004) Social influence: compliance and conformity. Annual Review

of Sociology 55(1974):591–621.18. Turner JC, Hogg MA, Oakes PJ, Reicher SD, Wetherell MS (1987) Rediscovering the social

group: A self-categorization theory. (Basil Blackwell, Cambridge, MA, US).19. Phillips L, Edwards W (1966) Conservatism in a simple propabilistic inference task. Journal

of Experimental Psychology 72(3):346–354.20. Tversky A, Kahneman D (1974) Judgment under Uncertainty: Heuristics and Biases. Science

185(4157):1124–1131.21. Navajas J, et al. (2017) The idiosyncratic nature of confidence. Nature Human Behaviour

1(11):810–818.22. Benjamin DJ (2019) Errors in probabilistic reasoning and judgment biases in Handbook of

Behavioral Economics - Foundations and Applications 2, eds. Bernheim BD, Dellavigna S,Laibson D.

23. Edwards W, Lindman H, Savage LJ (1963) Bayesian statistical inference for psychologicalresearch. Psychological Review 70(3):193–242.

24. Grether DM (1980) Bayes Rule as a Descriptive Model: The Representativeness Heuristic.The Quarterly Journal of Economics 95(3):537.

25. Buhrmester M, Kwang T, Gosling SD (2011) Amazon’s Mechanical Turk: A New Source ofInexpensive, Yet High-Quality, Data? Perspectives on Psychological Science 6(1):3–5.

26. Heider F (1958) The Psychology of Interpersonal Relations. (New York: Wiley).27. Pérez-Escudero A, de Polavieja GG (2011) Collective Animal Behavior from Bayesian Esti-

mation and Probability Matching. PLoS Computational Biology 7(11):e1002282.28. Arganda S, Pérez-Escudero A, de Polavieja GG (2012) A common rule for decision making

in animal collectives across species. Proceedings of the National Academy of Sciences ofthe United States of America 109(50):20508–13.

29. Bikhchandani S, Hirshleifer D, Welch I (1992) A Theory of Fads, Fashion, Custom, and Cul-tural Change as Informational Cascades. Journal of Political Economy 100(5):992–1026.

30. Easley D, Kleinberg J (2012) Networks, Crowds, and Markets: Reasoning about a highlyconnected world. (Cambridge University Press).

31. Madirolas G, de Polavieja GG (2015) Improving Collective Estimations Using Resistance toSocial Influence. PLOS Computational Biology 11(11):e1004594.

32. Stone M (1977) An Asymptotic Equivalence of Choice of Model by Cross-Validation andAkaike’s Criterion. Journal of the Royal Statistical Society: Series B (Statistical Methodol-ogy) 39(1):44–47.

33. Krafft PM, et al. (2016) Human collective intelligence as distributed Bayesian inference.34. Tenenbaum JB, Kemp C, Griffiths TL, Goodman ND (2011) How to grow a mind: statistics,

structure, and abstraction. Science (New York, N.Y.) 331(6022):1279–1285.35. Campolo A, Sanfilippo M, Whittaker M, Crawford K (2018) AI Now 2017 Report. (AI Now

Institute at New York University).36. Treiblmaier H, Filzmoser P (2011) Benefits from Using Continuous Rating Scales in Online

Survey Research in International Conference on Information Systems (ICIS).37. Behrend ER, Bitterman ME (1961) Probability-Matching in the Fish. The American Journal

of Psychology 74(4):542–551.38. Kirk KL, Bitterman ME (1965) Probability-Learning by the Turtle. Science (New York, N.Y.)

148(3676):1484–1485.39. Wozny DR, Beierholm UR, Shams L (2010) Probability Matching as a Computational Strategy

Used in Perception. PLoS Computational Biology 6(8):e1000871.40. Eggenberger F, Pólya G (1923) Über die Statistik verketteter Vorgänge. ZAMM - Journal of

Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik3(4):279–289.

41. Mahmoud H (2008) Polya Urn Models. (CRC Press).42. Alajaji F, Fuja T (1994) A communication channel modeled on contagion - Information Theory,

IEEE Transactions on. 40(6):2035–2041.43. Hayhoe M, Alajaji F, Gharesifard B (2018) A Polya Contagion Model for Networks. IEEE

Transactions on Control of Network Systems 5(4):1998–2010.44. Sedghi H, Janzamin M, Anandkumar A (2016) Provable Tensor Methods for Learning Mix-

tures of Generalized Linear Models in Proceedings of the 19th International Conference onArtificial Intelligence and Statistics, Proceedings of Machine Learning Research, eds. GrettonA, Robert CC. (PMLR, Cadiz, Spain), Vol. 51, pp. 1223–1231.

Supporting Information AppendixContents

1 Experiment I 9A Description . . . . . . . . . . . . . . . . . . . . . . . 9B Videos . . . . . . . . . . . . . . . . . . . . . . . . . . 9C Comments . . . . . . . . . . . . . . . . . . . . . . . . 9D Data Processing . . . . . . . . . . . . . . . . . . . . 9E Main Experimental Conditions . . . . . . . . . . . . 9F Partial Experimental Conditions . . . . . . . . . . . 9G Results . . . . . . . . . . . . . . . . . . . . . . . . . . 9H Demographics of Participants . . . . . . . . . . . . . 9

2 Experiment II 11A Description . . . . . . . . . . . . . . . . . . . . . . . 11B Videos . . . . . . . . . . . . . . . . . . . . . . . . . . 11C Comments . . . . . . . . . . . . . . . . . . . . . . . . 11D Data Processing . . . . . . . . . . . . . . . . . . . . 12E Demographics and Feedback of Participants . . . . . 12

3 Models 12A Model Selection . . . . . . . . . . . . . . . . . . . . . 12B Parameters of the Best Model . . . . . . . . . . . . . 14

1. Experiment I

We first conducted Experiment I, in preparation to Experiment II.To each participant, we showed videos with corresponding socialfeedback from two video-sharing websites, i.e., YouTube and Vimeo.We asked the participants what is their opinion about each of thevideos. In Experiment I, we used a 5-point Likert scale to measurethe opinion and we introduced partially manipulated conditionsin which we altered either only comments or only the counters forthumbs and views. Here, we present a detailed description of thisexperiment and its results.

A. Description. Before starting the survey the participants are giventhe instructions (nearly identical for both experiments, Figure S6).The subjects are asked to click on the image to watch the video,see the feedback of other people, and to evaluate the video. Foreach video, we ask a set of questions about the video (Figure S7).The participant can see the questions before watching the video,but cannot answer them without playing the video first, and allquestions must be answered before advancing to the next video. Thepages with the videos look exactly like in the original systems, i.e.,in Youtube and Vimeo, except the social feedback to the videos ismodified to various extent under different experimental conditions.The comments are shown under videos in the reverse chronologicalorder of their original creation date, reflecting the default settingin the respective video-sharing platforms at the time when theexperiments were performed.

For each video we conduct a survey (Figure S7). We ask theparticipant on a 5-point Likert scale if she agrees with the statementevaluating her opinion about the video (“I like this video”) and herwillingness to share the video (“I’d share this video with friends”).Additionally, we ask if the person saw this video before. In theanalysis, we discard the responses which were influenced by the pastexposures to the given video.

B. Videos. For the experiments, we picked four videos from YouTubeand four videos from Vimeo. To avoid videos that subjects haveseen before, we chose videos that have a public appeal but are notvery popular, namely have from 20, 000 to 1 million views, morethan 15 comments (including some positive and negative ones), andover 15 thumbs. The links to the original videos and the numbersof comments, thumbs, and views for each of the videos are listed inTable S2. The average duration of videos is 104 seconds. During theexperiment each participant is shown all eight videos in a randomorder, each of them under a different experimental condition, chosenin a round-robin fashion from: the control condition, two stronglymodified conditions, two mild conditions, two weak conditions, andfour partial manipulations. We describe the experimental conditionsbelow.

C. Comments. Each of the comments was labeled by an author aseither positive, neutral, negative, or unreadable (see Table S2 fora summary). Positive comments are the comments that describethe video in a positive way, while negative comments are negativetoward some aspects of the video. Neutral comments are mostly off-topic or do not contain any evaluations of the content of the video.The comments that are unreadable or are written in a languagedifferent from English are filtered out.

D. Data Processing. Before analyzing the data, we clean and filterit. Namely, we discard all answers that are incomplete due totechnical reasons or individual mistakes, as well as double answersfrom the same participant (in total, we discard less than 6% of allparticipants). To estimate the quality of answers, we measure howmuch time it takes for each participant to evaluate each video. Weexploit this information, to invalidate the video evaluations thattook a subject less time than half of video duration. Furthermore,we also invalidate the evaluations of videos that have been seenby the participant externally before the experiment, according toself-reports of participants for each video. Then, we discard theparticipants with answers invalidated for more than one videos. Intotal, we discard less than 10% of all participants.

E. Main Experimental Conditions. The main experimental conditionsinclude the control condition, two strongly modified conditions, twomild conditions, and two weak conditions. In strongly modifiedpositive (negative) condition, we hide all negative (positive) com-ments, we show all neutral and positive (negative) comments, andwe increase (decrease) the numbers of thumbs up and views by thefactor of 10. The mildly and weakly modified conditions are imple-mented as m-factor manipulations, which fine-tune the extent ofmodifications with respect to the control condition. In the m-factorpositive (negative) manipulation, we hide all negative (positive)comments except randomly chosen 1/m comments, and multiply(divide) the numbers of thumbs up and views by the factor m. Themild conditions are 6-factor manipulations, whereas the weak con-ditions are 3-factor manipulations. Note that the aforementionedstrongly modified conditions correspond to 10-factor manipulations,except all comments of certain type are hidden instead of theirmajority.

F. Partial Experimental Conditions. We introduce partial experimen-tal conditions to find which type of social feedback has largerinfluence on opinion change. Partial experimental conditions arethe same as the strong conditions, except they alter either onlycomments or only the counters of thumbs and views, while keepingthe other unchanged with respect to the control condition. Namely,in the partial positive (negative) condition manipulating comments,we hide all negative (positive) comments, while keeping the numbersof thumbs and views unchanged. In the partial positive (negative)condition manipulating counters, we multiply (divide) the number ofthumbs up and views by 10, while keeping the comments unchanged.These experimental conditions allow us to measure whether thecomments or the counters impact opinions more.

G. Results. We compute the probability of positive opinion (Fig-ure S8A) and the willingness to share the video (Figure S8B) undereach of the seven main conditions. We find that there is a sta-tistically significant difference between the opinions about videosshown under positive and negative manipulations (Mann-WhitneyU test, p < 10−9), as well as in comparison with the control con-dition (p = 0.007 and p < 10−3, respectively). Similarly, we findsignificant difference in sharing likelihood between the positive andnegative extreme manipulations (p < 10−5), and in comparisonwith the control condition (p = 0.001 and p = 0.047). Therefore, wefind the evidence of opinion change, as well as the change in sharingwillingness, in the case of both positive and negative manipulations.In fact, the differences are significant for most pairs of experimentalconditions (p < 0.05).

H. Demographics of Participants. The participants of the experimentwere recruited via Amazon Mechanical Turk. They performed thesurvey anonymously, voluntarily, and were compensated. At theend of the experiment, they answered a couple of questions abouttheir demographics. The participants of this experiment are more

Fig. S6. Instructions given before starting the survey.

Fig. S7. The survey performed immediately after the participant watched the video.

Comments ThumbsPlatform Video Pos. Neut. Neg. Up Down Views Link to the original

HTC add 17 30 47 110 31 147,273 dwGGdM3Nj08YouTube Lamborghini 14 1 5 41 28 76,675 Pc7XHHCjtJI

A girl singing 37 15 37 663 525 120,938 5JKJhY15NNASupermarket joke 5 3 15 60 45 25,892 VqGaHxC3Zbg

Google joke 2 7 10 50 - - 9261909Vimeo Stride add 8 8 3 221 - - 23061242

Ipad skateboard 33 14 13 842 - - 11480457Curing cancer 19 10 8 695 - - 54668275

Table S2. Basic statistics of the original (non-manipulated) videos: the number of comments of various types, the number of thumbs up andthumbs down, and the number of views at the time of their collection (January 2014).

Str

ongl

yn

eg.

Mild

lyn

eg.

Wea

kly

neg

.

Ori

gin

al

Wea

kly

pos

.

Mild

lyp

os.

Str

ongl

yp

os.


0.30

0.35

0.40

0.45

0.50

P+

Experiment I

Str

ongl

yn

eg.

Mild

lyn

eg.

Wea

kly

neg

.

Ori

gin

al

Wea

kly

pos

.

Mild

lyp

os.

Str

ongl

yp

os.


0.20

0.24

0.28

0.32

0.36

P(s

har

e)

Experiment I

A B

Fig. S8. The probability of positive opinion (left) and the sharing willingness (right)for the control condition (black circles) and the experimental conditions of variousstrength. The color of markers encodes positive (blue) and negative (red) conditions.

20 40 60 80Age

0

50

100

150

200

Par

tici

pant

s

mal

e

fem

ale

Sex

0

200

400

Par

tici

pant

s

A B

Fig. S9. Demographic information about the participants of the experiment.

likely to be male than female and are predominantly young adults(see Figure S9).

2. Experiment II

In this experiment, we track which exact comments are shown onthe screens of the participant of the experiment and we use a so-phisticated 200-point scale for measuring opinion. This experimenthas two parts. Each of the parts is conducted with a different setof 7 videos on a different set of 700 participants. We split thisexperiment in two parts to avoid overloading the participants withevaluations of too many videos. In each part of this experiment, aparticipant is shown 7 videos in a random order in 7 different mainexperimental conditions assigned randomly to the videos.

A. Description. The participants are given the same instructions asin Experiment I. They are instructed to watch videos, to look at thefeedback from other people about these videos, and to evaluate them.The subjects enter a webpage that looks like YouTube by clicking ona thumbnail of the video (Figure S10). For each video shown in the

Fig. S10. Participants access a video on YouTube or Vimeo by clicking on its thumb-nail.

experiment, we ask the participant to evaluate whether she agreesor disagrees with the following statements (Figure S11). To measurethe opinion, we ask about the agreement with the statement “I likethis video”. The participant answer this statement using a sliderbar in a scale from 100% to 100%(Figure S12). We did not give theoption of answering 0% to avoid neutral answers. To measure thesharing willingness, we use the statement: “I’d share this video withfriends” This statement can be answered with “Not Share/Share”.To test whether the participant agrees with other people’s feedback,we use the statement “I agree with the feedback of other peopleabout this video (with the comments, thumbs)”. Additionally, weask if the person have seen this video before, to account for thepossibility of prior influence outside of the experiment. After havinganswered these questions for each of the videos, the participant isasked a short demographic survey (Figure S13).

B. Videos. For the experiments, we picked fourteen videos fromYouTube of diverse quality, as judged by the fraction of thumbsup among thumbs (see Table S3). Note that in the main text, thefraction of thumbs up is used as an estimate of external opinion.We chose videos that have a public appeal but are not very pop-ular, namely have from 100, 000 to 2 million views, more than 50comments, and over 100 thumbs (at the moment of data collection).Since we do not create any artificial comments and use only theoriginal comments to manipulate the social feedback, we picked onlyvideos that have both positive and negative comments. The links tothe original videos and the numbers of each of the comment types,thumbs, and views for each of the videos are listed in Table S3.Average duration of the videos is 77 seconds.

C. Comments. Three labelers classified the comments as either posi-tive, neutral, negative, or unreadable (see Table S3 for a summary).The Fleiss’ kappa between the three labelers is 0.56. The comments

http://youtu.be/dwGGdM3Nj08

http://youtu.be/Pc7XHHCjtJI

http://youtu.be/5JKJhY15NNA

http://youtu.be/VqGaHxC3Zbg

https://vimeo.com/9261909




Fig. S11. Questions asked for each video. The opinion is measured with the statement “I like this video”.

Comments ThumbsSurvey Video Pos. Neut. Neg. Up Down Fraction up Views Link to the original

ski lift 168 179 76 533 154 0.78 455,970 GP2wvGVCsIUufo 55 136 317 140 1197 0.10 844,339 PCMklx9YvHQ

shark prank 76 53 49 205 211 0.49 307,750 Rk1LXZgCSpEPart I fat talk 296 203 75 2796 79 0.97 154,843 V2SHwdtBH64

pony shoes 196 217 653 439 1060 0.29 741,023 hJJtVUTWCccrollers trick 24 41 35 638 164 0.80 254,765 qgSv8B6UiUY

cat bath 148 156 105 1009 354 0.74 667,656 xR6j4ECkDT4feeding croc 48 16 34 576 58 0.91 836,942 EPW0m0mc6hc

veet add 59 55 139 604 1377 0.30 661,311 UxCHLXQffsggoogle car 89 26 68 1713 23 0.99 123,721 aqrttLPjv1E

Part II baby yoga 28 33 379 275 1996 0.12 870,166 fFwrZHFLe2Eall nighter 288 582 232 12592 502 0.96 1,008,121 kFcnUsYKT5wskywalk 20 28 19 275 25 0.92 245,629 laveE4bUm3M

haribo add 33 52 36 165 52 0.76 138,925 qc8vxx6J5Xw

Table S3. Basic statistics of the original, non-manipulated, videos: the number of comments of various types, the number of thumbs up andthumbs down, and the number of views at the time of their collection (February 2015).

Fig. S12. The psychometric scale for measuring opinion. The opinion is measuredwith the statement “I like this video”. Subjects answer this statement with a sliderranging from 100% “Disagree” to 100% “Agree”.

that are unreadable or are written in a language different fromEnglish are filtered out. The ambivalent comments with conflictinglabels, namely the comments that are labeled as both positive andnegative by different labelers, are filtered out as well. In total, wefilter out about 10% of all comments. We apply a majority rule todetermine the final label of each remaining comment.

During the experiment we measure which comments are shownto the participant on the screen. In order to see the comments, theparticipants need to scroll down, what allows precise measurements.Then, in the analysis, we estimate the number of comments read bythe participant with the number of comments shown on the screen.

D. Data Processing. We apply the same filtering steps as for thedata from Experiment I. As before, we discard the participants withincomplete answers, who gave answers more than once, or haveanswers invalidated for more than one videos (in total, less than12% of all participants).

E. Demographics and Feedback of Participants. The participants ofour experiments are slightly more likely to be male than female,predominantly white, young adults, with varying levels of educa-tion (see Figure S15). Many of them enjoyed participating in ourexperiments. Most of the voluntary comments left by the partic-ipants after the experiments express their positive sentiment andgratitude, e.g.: “I enjoyed this survey very much.”, “great videos!expected something boring”, and “fun videos. Thanks!”. Overall,185 participants voluntarily left positive feedback to our experimentscontaining at least one of these three words: “enjoy”, “great”, or“fun”.

3. Models

A. Model Selection. The model of sub-populations does not deter-mine what is the best number of sub-populations nor whether allparameters pkv, akv, and skv are indeed necessary. Each of theseparameters can depend on the video, can be shared across the videos,e.g., pkv ≡ pk, or vanish by being replaced with a fixed neutralvalue across videos, i.e., pkv ≡ 1/K, or ∀

vakv = 0, or ∀

vskv = 0.

We represent the structure of the model using the aforementionednotation for the likelihood,

P (∆n∆n∆n,yyy|k, pppK×V

, a1a1a11×V

, s1s1s11×V

, . . . , aKaKaK1×V

, sKsKsK1×V

), [13]

which states that the parameters p, a1, s1, . . . , aK , sK dependon the videos. As an example, we show the representation of themodel with K = 2 sub-populations, vanishing s1, parameter s2independent from videos, and the remaining parameters dependent

http://youtu.be/GP2wvGVCsIU

http://youtu.be/PCMklx9YvHQ

http://youtu.be/Rk1LXZgCSpE

http://youtu.be/V2SHwdtBH64

http://youtu.be/hJJtVUTWCcc

http://youtu.be/qgSv8B6UiUY

http://youtu.be/xR6j4ECkDT4

http://youtu.be/EPW0m0mc6hc

http://youtu.be/UxCHLXQffsg

http://youtu.be/aqrttLPjv1E

http://youtu.be/fFwrZHFLe2E

http://youtu.be/kFcnUsYKT5w

http://youtu.be/laveE4bUm3M

http://youtu.be/qc8vxx6J5Xw

Fig. S13. Demographic questions. After completing the survey for the seven videos the participant was asked to answer the following questions about personal demographicinformation.

Rank Model − log(P ) #parameters AIC

1 P (∆n∆n∆n,yyy|K = 2, pppK×V

, a1a1a11×V

, s1 ≡ 0, a2a2a21×V

, s2) 3316.4 43 6718.7


, a1, s1, a2a2a21×V

, s2 ≡ 0) 3330.7 30 6721.5

3 P (∆n∆n∆n,yyy|K = 2, pppK×1

, a1a1a11×V

, s1 ≡ 0, a2a2a21×V

, s2s2s21×V

) 3318.4 43 6722.8


, a1 ≡ 0, s1, a2a2a21×V

, s2 ≡ 0) 3332.4 29 6722.8


, a1, s1, a2, s2 ≡ 0, a3, s3 ≡ 0) 3330.8 32 6725.5


, a1, s1s1s11×V

, a2a2a21×V

, s2 ≡ 0) 3320.1 43 6726.2


, a1, s1 ≡ 0, a2, s2 ≡ 0, a3 ≡ 0, s3) 3332.4 31 6726.8


, a1a1a11×V

, s1s1s11×V

, a2a2a21×V

, s2 ≡ 0, a3 ≡ 0, s3s3s31×V

) 3307.5 58 6730.9


, a1a1a11×V

, s1s1s11×V

, a2a2a21×V

, s2 ≡ 0, a3 ≡ 0, s3 ≡ 0) 3322.9 44 6733.8


, a1 ≡ 0, s1s1s11×V

, a2a2a21×V

, s2 ≡ 0) 3325.0 42 6733.9

Table S4. The list of top variants of the sub-population model. The variants are ranked by the value of AIC. All top variants have sk ≡ 0 for atleast one sub-population.

Survey Video p2 σ(p2) a2 σ(a2) a1 σ(a1)

ski lift 0.32 0.01 0.05 0.06 3.44 2.29 5.90 1.28 -0.75 0.02 0.23 0.20ufo 0.11 0.02 0.06 0.04 -1.87 3.35 5.37 2.43 -3.84 0.08 0.40 0.38

shark prank 0.16 0.01 0.05 0.05 1.77 0.60 5.01 1.70 -1.47 0.02 0.19 0.21Part I fat talk 0.17 0.01 0.07 0.06 -8.28 3.19 8.95 2.00 0.80 0.07 0.30 0.19

pony shoes 0.12 0.00 0.10 0.07 -9.67 4.07 7.01 3.99 -2.37 0.02 0.22 0.21rollers trick 0.40 0.01 0.04 0.05 -0.22 0.13 1.16 0.74 -0.25 0.01 0.17 0.17

cat bath 0.35 0.01 0.05 0.05 1.39 0.36 2.89 0.86 0.17 0.01 0.17 0.17

feeding croc 0.29 0.07 0.16 0.06 15.51 1.58 3.74 1.28 0.59 0.16 0.43 0.20veet add 0.15 0.03 0.12 0.04 3.04 0.65 2.32 2.43 -0.15 0.03 0.68 0.38

google car 0.05 0.01 0.13 0.05 1.59 1.59 3.64 1.70 2.43 0.05 0.39 0.21Part II baby yoga 0.20 0.05 0.16 0.06 0.50 0.81 2.00 2.00 -2.52 0.08 0.42 0.19

all nighter 0.15 0.03 0.10 0.07 -3.60 1.68 1.96 3.99 1.06 0.07 0.63 0.21skywalk 0.18 0.05 0.18 0.05 1.47 0.31 1.99 0.74 1.89 0.05 1.07 0.17

haribo add 0.21 0.04 0.12 0.05 -0.93 0.47 1.55 0.86 0.89 0.04 0.31 0.17

Table S5. The values of parameters of the best sub-population model and their standard error. The standard error is estimated using threemethods, from left to right: jackknife re-sampling, bootstrap re-sampling, and observed Fisher information. Note that the standard error ofprior opinions of influenceable sub-population is many times larger, i.e., σ(a2) > σ(a1).

Str

ongl

yn

eg.

Mild

lyn

eg.

Wea

kly

neg

.

Ori

gin

al

Wea

kly

pos

.

Mild

lyp

os.

Str

ongl

yp

os.


0.40

0.48

0.56

0.64

P+

Experiment II

Str

ongl

yn

eg.

Mild

lyn

eg.

Wea

kly

neg

.

Ori

gin

al

Wea

kly

pos

.

Mild

lyp

os.

Str

ongl

yp

os.


0.16

0.20

0.24

0.28

0.32

P(s

har

e)

Experiment II

A B

Fig. S14. The probability of positive opinion (left) and the sharing willingness (right)for the control condition (black circles) and the experimental conditions of variousstrength. The color of markers encodes positive (blue) and negative (red) conditions.

20 40 60 80Age

0

100

200

Par

tici

pant

s

mal

e

fem

ale

Sex

0

200

400

600

Par

tici

pant

s

whi

te

asia

n

hisp

anic

blac

k

othe

r

Ethnicity

0

250

500

750

Par

tici

pant

s

elem

enta

ry

high

colle

ge

bach

elor

mas

ter

Education

0

200

400

Par

tici

pant

s

A B

C D

Fig. S15. Demographic information about the participant of the experiment.

on videos:P (∆n∆n∆n,yyy|K = 2, ppp

K×V, a1a1a1

1×V, s1 ≡ 0, a2a2a2

1×V, s2), [14]

which is the variant introduced in the main text of this manuscriptthat maximizes Akaike information criterion (AIC). In other words,each of the parameters of a model variant takes one of N = 3different forms and each sub-population is defined by one of N2

combinations of parameters. The sub-populations are interchange-able, so in total there is N ×

[(N2+K−1

K

)]variants of the model

for K sub-populations. Note, however, that a model with morethan one sub-population with vanishing parameters, i.e., multiplesub-populations k such that ∀

vakv = 0 and ∀

vskv = 0, is equivalent

to a model having just one such sub-population. Hence, overall,for K sub-populations there are N ×

[(N2+K−1

K

)−(N2+K−3K−2

)]different variants of the sub-populations model.

We fit each variant of sub-populations model for K ∈{1, 2, . . . , 6} by maximizing the log-likelihood of the correspondingmodel (Equation 10). To this end, we perform 1000 realizations ofthe L-BFGS-B algorithm, each time initializing the parameters withrandom values sampled from a standard normal distribution, withthe exception of pkv, which is always initialized with the uniformdistribution of individuals over sub-populations, i.e., pkv = 1/K.Different variants of sub-populations model generally have varyingnumber of parameters and sub-populations. After fitting these vari-ants of the model, we compare them using AIC, which penalizes forthe number of parameters. We plot the best values of AIC and log-likelihood against the number of parameters and sub-populations(Figure S16). The models achieving low AIC tend to have two sub-populations and less than 50 parameters, that is about 3 parametersper video. The top ten best variants of the models are shown inTable S4. Out of these ten models, six have two sub-populations,while four have three sub-populations. In all cases, among these sub-populations, there is at least one non-influenceable sub-population(∀vskv = 0). The four top variants of the sub-population model are

very similar to the top model presented in the main text, while theremaining top models are also similar, but may have an additionalsub-population. Overall, the findings shown in the main text forthe top model are consistent with the other top ten models.

B. Parameters of the Best Model. The best sub-population modelis P (∆n∆n∆n,yyy|K = 2, ppp

K×V, a1a1a1

1×V, s1 ≡ 0, a2a2a2

1×V, s2). It has two sub-

populations: non-influenceable sub-population with influenceabilitys1 ≡ 0 and influenceable sub-population having s2 = 0.8±0.005. Welist the value of the remaining parameters for each video in Table S5.Individuals in influenceable sub-population have much noisier prioropinions about each video than individuals in non-influenceablesub-populations (Figure S17).

1 2 3 4 5 6#clusters

3300

3350

3400−

log(P

)

6700

6750

6800

6850

AIC

50 100 150 200#parameters

3300

3400

3500

3600

−lo

g(P

)

6600

6800

7000

7200

AIC

Fig. S16. The minimal negative log-likelihood and Akaike information criterion of thesub-population model having a given number of clusters (top) or parameters (bottom).

−10

0

10

20

a2

−4

−2

0

2

a1

A B C D E F G H I J K L M NVideo

10−1

100

101

102

σ(a

2)/σ

(a1)∗√

p 2/p

1

jackknife bootstrap Fisher inf. average

Fig. S17. The values of parameters with their standard errors of the bestsub-population model for influenceable (top) and non-influenceable (middle) sub-population. The standard deviation is computed using three different methods, fromleft to right: jackknife re-sampling, bootstrap re-sampling, and the inverse of ob-served Fisher information. The bottom figure shows the ratio of sub-populations’standard-deviation corrected for their sizes.

bayesiansocialinﬂuenceintheonlinerealm - arxiv · bayesiansocialinﬂuenceintheonlinerealm...

Documents