communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the...

13
Eur. Phys. J. B 84, 147–159 (2011) DOI: 10.1140/epjb/e2011-20172-4 Regular Article T HE EUROPEAN P HYSICAL JOURNAL B Communication activity in social networks: growth and correlations D. Rybski 1,2, a , S.V. Buldyrev 3 , S. Havlin 4 , F. Liljeros 5 , and H.A. Makse 1 1 Levich Institute and Physics Department, City College of New York, New York, NY 10031, USA 2 Potsdam Institute for Climate Impact Research (PIK), P.O. Box 60 12 03, 14412 Potsdam, Germany 3 Department of Physics, Yeshiva University, New York, NY 10033, USA 4 Department of Physics, Bar-Ilan University, 52900 Ramat-Gan, Israel 5 Department of Sociology, Stockholm University, 10691 Stockholm, Sweden Received 8 March 2011 / Received in final form 2 August 2011 Published online 26 October 2011 – c EDP Sciences, Societ`a Italiana di Fisica, Springer-Verlag 2011 Abstract. We investigate the timing of messages sent in two online communities with respect to growth fluctuations and long-term correlations. We find that the timing of sending and receiving messages com- prises pronounced long-term persistence. Considering the activity of the community members as growing entities, i.e. the cumulative number of messages sent (or received) by the individuals, we identify non-trivial scaling in the growth fluctuations which we relate to the long-term correlations. We find a connection be- tween the scaling exponents of the growth and the long-term correlations which is supported by numerical simulations based on peaks over threshold. In addition, we find that the activity on directed links between pairs of members exhibits long-term correlations, indicating that communication activity with the most liked partners may be responsible for the long-term persistence in the timing of messages. Finally, we show that the number of messages, M, and the number of communication partners, K, of the individual members are correlated following a power-law, K M λ , with exponent λ 3/4. 1 Introduction Seeking for simple laws and regularities in human activity, researchers belonging to various disciplines aim to study social phenomena by describing them with methods from natural sciences. Since communication plays a predomi- nant role in social systems, it is desired to obtain better insight into the nature of communication patterns – and therefore to understand both, communication itself and the social systems. Although it is clear that communica- tion is related to the embedment in social networks, the actual dynamical processes are still poorly understood. Studying economic data, surprising growth patterns have been identified [1], which seem to be abundant in systems with growth-like features [28]. Considering the units of a system of interest and calculating their loga- rithmic growth rates between two time steps, it was found that the standard deviation of the growth rates decays as a power-law with the initial size [1]. This finding repre- sents a violation of Gibrat’s law [911] stating that the average and the standard deviation of the growth rate of a given economic indicator are constant and independent of the specific indicator value, see also [7]. a e-mail: [email protected] In a recent study [12] we have found several scaling laws characterizing the communication activity in online social networks. We found the existence of long-term cor- relations in human activity of sending messages to other members in the social network. The long-term persistence is related to the fluctuations in the growth properties of the social network as measured by the cumulative num- ber of message sent by the members. The present paper expands this previous work by studying the messages sent in two online social networks with respect to the following properties. First, we extend the results obtained in [12], revealing the analogue correlations in the timing of re- ceiving messages. Furthermore, we analyze the temporal correlations of the activity on directed links, i.e. between pairs of members, and find almost identical results as on the level of the single members. Second, in line with [12] we study the growth of the cu- mulative communication activity of the members in terms of the cumulative numbers of messages sent and received. In [12] we have shown that the standard deviation of the growth rates of the cumulative number of messages sent by individuals depend on the ‘size’ of the member (defined as the cumulative numbers of messages) following a power- law with exponent β 0.2, significantly different from the random exponent β rnd =1/2, indicating nontrivial fluctuations and persistence in the human communication

Upload: others

Post on 07-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

Eur Phys J B 84 147ndash159 (2011)DOI 101140epjbe2011-20172-4

Regular Article

THE EUROPEANPHYSICAL JOURNAL B

Communication activity in social networks growthand correlations

D Rybski12a SV Buldyrev3 S Havlin4 F Liljeros5 and HA Makse1

1 Levich Institute and Physics Department City College of New York New York NY 10031 USA2 Potsdam Institute for Climate Impact Research (PIK) PO Box 60 12 03 14412 Potsdam Germany3 Department of Physics Yeshiva University New York NY 10033 USA4 Department of Physics Bar-Ilan University 52900 Ramat-Gan Israel5 Department of Sociology Stockholm University 10691 Stockholm Sweden

Received 8 March 2011 Received in final form 2 August 2011Published online 26 October 2011 ndash ccopy EDP Sciences Societa Italiana di Fisica Springer-Verlag 2011

Abstract We investigate the timing of messages sent in two online communities with respect to growthfluctuations and long-term correlations We find that the timing of sending and receiving messages com-prises pronounced long-term persistence Considering the activity of the community members as growingentities ie the cumulative number of messages sent (or received) by the individuals we identify non-trivialscaling in the growth fluctuations which we relate to the long-term correlations We find a connection be-tween the scaling exponents of the growth and the long-term correlations which is supported by numericalsimulations based on peaks over threshold In addition we find that the activity on directed links betweenpairs of members exhibits long-term correlations indicating that communication activity with the mostliked partners may be responsible for the long-term persistence in the timing of messages Finally weshow that the number of messages M and the number of communication partners K of the individualmembers are correlated following a power-law K sim Mλ with exponent λ asymp 34

1 Introduction

Seeking for simple laws and regularities in human activityresearchers belonging to various disciplines aim to studysocial phenomena by describing them with methods fromnatural sciences Since communication plays a predomi-nant role in social systems it is desired to obtain betterinsight into the nature of communication patterns ndash andtherefore to understand both communication itself andthe social systems Although it is clear that communica-tion is related to the embedment in social networks theactual dynamical processes are still poorly understood

Studying economic data surprising growth patternshave been identified [1] which seem to be abundant insystems with growth-like features [2ndash8] Considering theunits of a system of interest and calculating their loga-rithmic growth rates between two time steps it was foundthat the standard deviation of the growth rates decays asa power-law with the initial size [1] This finding repre-sents a violation of Gibratrsquos law [9ndash11] stating that theaverage and the standard deviation of the growth rate ofa given economic indicator are constant and independentof the specific indicator value see also [7]

a e-mail ca-drrybskide

In a recent study [12] we have found several scalinglaws characterizing the communication activity in onlinesocial networks We found the existence of long-term cor-relations in human activity of sending messages to othermembers in the social network The long-term persistenceis related to the fluctuations in the growth properties ofthe social network as measured by the cumulative num-ber of message sent by the members The present paperexpands this previous work by studying the messages sentin two online social networks with respect to the followingproperties First we extend the results obtained in [12]revealing the analogue correlations in the timing of re-ceiving messages Furthermore we analyze the temporalcorrelations of the activity on directed links ie betweenpairs of members and find almost identical results as onthe level of the single members

Second in line with [12] we study the growth of the cu-mulative communication activity of the members in termsof the cumulative numbers of messages sent and receivedIn [12] we have shown that the standard deviation of thegrowth rates of the cumulative number of messages sent byindividuals depend on the lsquosizersquo of the member (defined asthe cumulative numbers of messages) following a power-law with exponent β asymp 02 significantly different fromthe random exponent βrnd = 12 indicating nontrivialfluctuations and persistence in the human communication

148 The European Physical Journal B

activity in the social networks Here we further study thedistribution of the logarithmic growth rates and find ex-ponential decays similarly to those encountered in econo-physics [1]

Third in order to understand the relation between thelong-term correlations and growth fluctuations we pro-pose a simulation approach based on peaks over thresh-old modeling Using artificially generated long-term corre-lated sequences a message is sent when the record exceedsa predefined threshold Numerically we measure the long-term correlations characterized by the exponent H as wellas growth fluctuations characterized by β and find thatthe relation connecting both exponents proposed in [12]holds

Fourth we introduce a new growth rate between anypair of members quantifying the mutual growth in thenumber of messages We find that the correspondinggrowth fluctuations follow a power-law with similar expo-nent as for the lsquonormalrsquo growth rates We motivate thatthe exponent might be related to cross-correlations in theactivity of the members

Fifth in addition to the temporal correlations we in-vestigate the total number of messages sent or receivedand the total in- and out-degree (ie the number of differ-ent members from which a member receives or to whomheshe sends) We find that the total degree and the finalnumber of messages are correlated following a power-lawwith exponent close to 075 In the case of final in- vs out-degree deviations from the linear correlations are found

Finally we point out that there is also a relation be-tween our results on growth fluctuations and long-rangecorrelations (β and H respectively) and the existence ofpower-law distributed inter-event times characterized bythe exponent δ [13] leading to the clustering and bursts inthe activity of members This connection is explored in afollow-up paper [14]

Our results have important implications for the de-sign of communication systems The correlations can beelaborated to better predict information propagation seeeg [15] In addition the characterization of fluctuations isessential for the knowledge of uncertainty Our approachcould be also applied in natural systems such as in thecontext of protein unfolding [16]

This paper is organized as follows In Section 2 webriefly describe the data of messages sent in two onlinecommunities Our results are presented in Section 3 whichis organized in four sub-sections ndash discussing long-termcorrelations growth fluctuations modeling and other cor-relations Finally we draw our conclusions in Section 4

2 Data

We analyze the timing of messages sent in two Internetcommunities [1217] The data of the first online com-munity (wwwqxse QX)1 consists of over 80 000 mem-

1 The study of the de-identified dating site network data wasapproved by the Regional Ethical Review board in Stockholmrecord 200553

bers and more than 125 million messages sent during63 days (mid November 2005 until mid January 2006) Thedata of the second online community (wwwpussokramcom POK) covers 492 days (February 2001 until June2002) of activity with more than 500 000 messages sentamong almost 30 000 members [18ndash20] This correspondsto the entire lifespan of the social network Both web-sitesare used for dating and general social interactions TheQX community is used mainly by Swedish gay and les-bian while POK was targeted to Swedish teenagers andyoung adults All data are completely anonymous lackany message content and consist only of the time whenthe messages are sent and identification numbers of thesenders and receivers The advantage of these data sets isthat they provide the exact time when the messages weresent ndash in contrast to similar network data sets consistingonly of snapshots ie temporally aggregated social net-works expressing who sent messages to whom (see [17] fora discussion)

Similarly to other online communities the memberscan log in and meet virtually There are different ways ofinteracting in these communities Common among mostof such online communities is the possibility to choosefavorites ie a list of other members that a person some-how feels committed to In addition the platforms offerthe possibility to join groups and discuss with other mem-bers about specific topics We focus on the messages sentamong the members These messages are similar to e-mailsbut have the advantage that they are sent within a closedcommunity where there are no messages coming from orgoing outside

From the message data one can also build networkswhich consist of links connecting nodes We consider themembers as nodes and set a directed link from node ato b when member a sends at least one message to bThe degree k of a node is the number of other nodes it isconnected to ie the number of links it has In the directedcase one distinguishes between out-degree (number of out-going links) and in-degree (number of in-going links)

3 Analysis

31 Long-term correlations

First we define the activity record μj(t) counting thenumber of messages member j sends at dayweek t Thuswe study the activity that is aggregated at the daily orweekly level This is done to avoid possible oscillationsthat are observed in the data at both frequencies

In a previous study [12] we have applied detrendedfluctuation analysis (DFA) [21ndash23] and found that theactivity records μ(t) exhibit long-term correlationswhich are characterized by a power-law decaying auto-correlation function

C(Δt) =1σ2

μ

〈[μ(t) minus 〈μ(t)〉] [μ(t + Δt) minus 〈μ(t)〉]〉

sim (Δt)minusν

D Rybski et al Communication activity in social networks growth and correlations 149

101

102

103

Δt [days]

10minus1

100

101

102

FD

FA

2(Δt

) (

wee

kly

reso

lutio

n)

101

102

103

Δt [days]

10minus1

100

101

102

FD

FA

2(Δt

) (

daily

res

olut

ion)

~ (Δt)1

~ (Δt)12

~ (Δt)12

~ (Δt)1

(a) daily resolution (b) weekly resolutionPOK POK

Fig 1 (Color online) Comparison of fluctuation functions in(a) daily and (b) weekly resolution of members sending mes-sages in POK The different curves correspond to different ac-tivity levels M = 1minus2 3ndash7 8ndash20 21ndash54 55ndash148 149ndash403404ndash1096 1097ndash2980 total messages (from bottom to top) Thecurves in (b) have been shifted along the Δt axis to match dailyresolution In both cases the asymptotic scaling is the sameThe dotted lines correspond to the exponents H = 1 (top) andH = 12 (bottom)

where 〈μ(t)〉 is the average of the record μ(t) σμ is itsstandard deviation and ν is the correlation exponent(1 ge ν ge 0) The fluctuation function provided by DFAscales as

F (Δt) sim (Δt)H (1)

where the exponent H is similar to the Hurst exponent(12 le H le 1 larger exponents correspond to more pro-nounced long-term correlations) It is related to the cor-relation exponent via

ν = 2 minus 2H (2)

For uncorrelated or short-term correlated records theasymptotic fluctuation exponent is H = 12 (for a reviewwe refer to [24])

In order to study the activity with respect to long-termcorrelations we apply second order DFA (DFA2) [2223](linear detrending of μj(t)) and obtain the fluctuationfunctions F j

DFA2(Δt) (details can be found in [12]) Sincethe activity records of the individual members are tooshort we average the squared fluctuation functions amongmembers with similar overall activity (ie total num-ber of messages M) F (Δt) = [ 1

nM

sumj|M (F j(Δt))2]12

where nM is the number of members with M messagesTherefore we employ logarithmic bins in M The activitydistributions are discussed in Section 341

In Figure 1 we compare for sending in POK the fluc-tuation functions in daily resolution (Fig 1a) and weeklyresolution (Fig 1b) In order to match the scales we haveshifted the curves in Figure 1b along the Δt-axis Natu-rally in daily resolution the fluctuation functions covermore scales The asymptotic scaling is in both cases thesame namely no correlations in the case of least activemembers and strong long-term correlations with fluctua-tion exponents close to 1 for the most active membersMoreover for POK in daily resolution the fluctuationfunctions exhibit an increase from small slopes on shorttime scales to larger slopes on large scales This indicatesthat the long-term correlations do not vanish after certain

100

101

102

103

104

num of total messages sent M

04

06

08

1

12

HD

FA

2 (

10lt

=slt

=63

)

send

realshuffled individual sequences of messages per day

101

102

103

104

num of total messages received M

receive

(a) QX (b) QX

Fig 2 (Color online) Fluctuation exponents of the commu-nication activity (a) sending and (b) receiving messages bymembers of QX The exponents are plotted as a function of theactivity level M ie total number of messages for the originaldata (green circles) and individually shuffled sequences (or-ange diamonds) See also [12]

scale but the opposite the long-term correlations becomestronger Note that we use weekly resolution in order tocope with possible weekly oscillations [25ndash27]

We measure the fluctuation exponents by applyingleast squares fits to log F (Δt) vs log Δt on the scales10 lt Δt lt 63 days (QX) and 10 lt Δt lt 70 weeks (POK)For the former case the obtained fluctuation exponents areplotted in Figure 2 as a function of the members activitylevel ie their total number of messages M For sending(panel (a)) the less active members exhibit uncorrelatedbehavior The more messages the members send overallthe stronger correlated is their activity The fluctuationexponent HQX increases with M and reaches values up to075 plusmn 005 (sending) In contrast for the shuffled datathe fluctuation exponents are always very close to 12This confirms that the long-term correlations are due tothe temporal structure of the times each member sendshisher messages see also [14] For receiving messagesFigure 2b we find almost identical results The error barsin Figure 2 were calculated by subdividing the groups ofdifferent activity level The size of the error bars is simplythe standard deviation of the corresponding exponents

The estimated fluctuation exponents obtained forPOK are displayed in Figure 3 Qualitatively we obtaina similar picture as for QX However in contrast to QXhere the original records achieve larger fluctuation expo-nents up to 091 plusmn 004 (sending) disregarding the lastpoints which carry large error-bars A possible reason forthese different maximum exponents could be that in thecase of POK the data covers a much longer period of dataacquisition and possible non-stationarities [28] In QXthe members might not have had enough time to exhibitthe full extend of their persistence while in POK we followthe entire evolution of the online community

Indeed similar behavior of long-term correlations havebeen found in traded values of stocks and e-mail commu-nication [2930] where the fluctuation exponent increasesin an analogous way with the mean trading activity of the

150 The European Physical Journal B

100

101

102

103

104

num of total messages sent M

04

06

08

1

12

HD

FA

2 (

10lt

=slt

=70

)

send

realshuffled individual sequences of messages per day

101

102

103

104

num of total messages received M

receive

(a) POK (b) POK

Fig 3 (Color online) Fluctuation exponents of the commu-nication activity (a) sending and (b) receiving messages bymembers of POK (weekly resolution) The exponents are plot-ted as a function of the activity level M for the original data(green circles) and individually shuffled sequences (orange di-amonds) See also [12]

corresponding stock or with the average number of e-mails(see also [31])

Apart from these for human related data long-term persistence has been reported for physiologicalrecords [223233] written language [34] or for recordsgenerated by collective behavior such as finance and econ-omy [35ndash37] Ethernet traffic [38] Wikipedia access [39]as well as highway traffic [4041] There are also indicationsof long-term correlations in human brain activity [4243]and human motor activity [44]

A question that arises is why the fluctuation expo-nent (in Figs 2 and 3) depends on the activity levelof the members that is why the least active membersexhibit no persistence while the most active members ex-hibit strong persistence We argue that if only few mes-sages appear in the whole period of data acquisition long-term persistence cannot be reflected In these cases it isquite possible that much longer records and higher ag-gregation level such as months or years would be neededto reveal the persistence But doing so there would beother members with even less messages which then againwould probably appear with seemingly uncorrelated mes-sage signals Thus we propose that the exponents of thelargest activity reflect more accurately the scaling behav-ior of human communication activity In Section 331we propose statistical simulations to generate data usingpeaks over threshold (POT) and find that it supports thisperception

At this point we need to mention that long-term cor-relations can be related to broad inter-event time dis-tributions ie the times between successive messages ofindividual members Such distributions have been inves-tigated see eg [1345] but there is no consensus on thefunctional form We study the inter-event time distribu-tions in a different publication [14] where we demonstratethe connection with the long-term correlations found here

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

100

101

102

103

104

total number of messages ML

04

06

08

1

12

HD

FA

2 (

10 lt

= s

lt=

63)

(a) QX (b) QX

~ (Δt)075

~ Δt12

Fig 4 (Color online) Temporal correlations in the dailyamount of messages on directed links in QX (a) DFA2 fluctu-ation functions versus the time scale Δt averaged conditionalto the final number of messages of each link The differentcurves correspond to different activity levels ML = 1minus2 3ndash7 8ndash20 21ndash54 55ndash148 149ndash403 404ndash1096 1097ndash2980 (frombottom to top) The dotted lines correspond to the exponentsH = 075 (top) and H = 12 (bottom) (b) The DFA2 fluctua-tion exponent HLQX obtained from (a) is plotted as a functionof the activity level ML The exponents were obtained in therange of scales 10 le Δt le 63 days The activity along directedlinks comprise similar long-term correlations as the total ac-tivity of individual members to all of their acquaintances

Along directed links

In Figure 4 we study for QX the long-term correlationsin activity not on the sender or receiver (node) level buton the level of messages along directed links This meansthat we track when a message is sent directed between twomembers but separately for any pair of members such asararrb brarra ararrd Accordingly we determine the activ-ity records μab(t) μba(t) μad(t) etc expressing how manymessages have been sent each dayweek t between anypair of members Analogous there is also an activity levelfor the links M ab

L (we disregard those pairings with-out activity) Then we perform the analogous analysis forlong-term correlations by applying DFA2 and averagingamong pairings with similar overall activity (the distribu-tions of activity are discussed in Sect 341) The fluctua-tion functions in Figure 4a have asymptotic slopes close to12 for those links with few total number of messages Incontrary those links with many total number of messagesexhibit long-term correlations with exponents up to 074The fluctuation exponents as a function of the activitylevel ML are plotted in Figure 4b Apart from the factthat by definition the number of messages on the most ac-tive links is lower (or equal) than the number of messagesof the most active members the curve looks very similarto the one in Figure 2a in particular the maximum expo-nents are quite similar (HQX asymp 075 and HLQX asymp 074)This indicates that the persistence in the communicationmay be dominated by the communication activity withthe most liked partners

In [46] a different concept of persistence links has beeninvestigated The period of data acquisition is partitionedinto time slices in each of which a network is built Thenthe persistence is defined as the normalized number oftime slices in which a certain link appears However theapproach of our work is not compatible with the one in [46]and we cannot directly compare the results

D Rybski et al Communication activity in social networks growth and correlations 151

32 Growth process

321 Growth in the number of messages

As suggested in [12] we also analyze the growth prop-erties of the message activity This concept is borrowedfrom econophysics where the growth of companies hasbeen found to exhibit non-trivial scaling laws [1] that inparticular violate the original Gibratrsquos law [9ndash1147] andat the same time represents a generalized Gibratrsquos law(GGL) [12] In the present study each member is con-sidered as a unit and the number of messages sent or re-ceived since the beginning of data acquisition representsits size We analyze the growth in the number of mes-sages in analogy to other systems such as the growth ofcompanies [148] or the growth of cities [749] The anal-ogy is supported by some aspects (i) the members of acommunity represent a population similar to the popula-tion of a country (ii) the number of members fluctuatesand typically grows analogous to the number of cities of acountry (iii) the activity or number of links of individualsfluctuates and grows similar to the size of cities

The cumulative number mj(t) expresses how manymessages have been sent by a certain member j up to agiven time t (for a better readability we will not writethe index j explicitly m(t)) We consider the evolution ofm(t) between times t0 and t1 within the period of dataacquisition T (t0 lt t1 le T ) as a growth process whereeach member exhibits a specific growth rate rj (r for shortnotation)

r = lnm1

m0 (3)

where m0 equiv m(t0) and m1 equiv m(t1) are the number ofmessages sent until t0 and t1 respectively by every mem-ber To characterize the dynamics of the activity we con-sider two measures (i) The conditional average growthrate 〈r(m0)〉 quantifies the average growth of the num-ber of messages sent by the members between t0 and t1depending on the initial number of messages m0 In otherwords we consider the average growth rate of only thosemembers that have sent m0 messages until t0 (ii) Theconditional standard deviation of the growth rate for thosemembers that have sent m0 messages until t0

σ(m0) equivradic

〈(r(m0) minus 〈r(m0)〉)2〉 (4)

expresses the statistical spread or fluctuation of growthamong the members depending on m0 Both quanti-ties are relevant in the context of Gibratrsquos law in eco-nomics [9ndash1147] which proposes a proportionate growthprocess entailing the assumption that the average and thestandard deviation of the growth rate of a given economicindicator are constant and independent of the specific in-dicator value That is both 〈r(m0)〉 and σ(m0) are inde-pendent of m0

As shown in [12] for the message data the conditionalaverage growth rate is almost constant and only decreasesslightly

〈r(m0)〉 sim mminusα0 (5)

with an exponent α asymp 005 This means that memberswith many messages in average increase their number ofmessages almost with the same rate as members with fewmessages In contrast the conditional standard deviationclearly decreases with increasing m0

σ(m0) sim mminusβ0 (6)

where t0 = T2 is optimal in terms statistics In this casethe exponents βQX = 022 plusmn 001 and βPOK = 017 plusmn003 for sending messages were found [12] This meansalthough the average growth rate almost does not dependon m0 the conditional standard deviation of the growthof members with many messages is smaller than the one ofmembers with few messages Due to weaker fluctuationsactive members are relatively better predictable in theiractivity of sending messages

It has been shown that the fluctuation exponent H andthe growth fluctuation exponent β are related via [12]

β = 1 minus H (7)

Equation (7) is a scaling law formalizing the relation be-tween growth and long-term correlations in the activ-ity According to equation (7) the original Gibratrsquos law(βG = 0) corresponds to very strong long-term correla-tions with HG = 1 In contrast βrnd = 12 representscompletely random activity (Hrnd = 12) The observedmessage data comprises 12 gt β gt 0 and 12 lt H lt 1Surprisingly the values of β found here are very close tothe β values found for companies in the US economy [1]

In the case of companies also the distribution ofgrowth rates has been studied It was found that the dis-tribution density follows [1]

p(r|m0) =1

sσ(m0)exp

(

minuss|r minus 〈r(m0)〉|σ(m0)

)

(8)

whereas s =radic

2 Next we analyze how the growth rates rare distributed in the case of the message data

First we need to point out that in contrast to thegrowth of companies our entities can never shrink Themembers cannot loose messages the number m(t) eitherincreases or remains the same Accordingly in our caser ge 0 and therefore s = 1 as can be derived for thesingle-sided exponentially decaying distribution

Figure 5 shows p(r|m0) for QX where the values arescaled to collapse according to equation (8) with s = 1In order to have reasonable statistics we define the con-dition m0 in rather wide ranges namely according to thedecimal logarithm For sending (Fig 5a) and receiving(Fig 5b) messages the scaled probability densities col-lapse and are quite similar Nevertheless the growth ratesdo not exactly follow equation (8) with s = 1 While forthe less active members with small growth rates we finda good agreement for more active members and largegrowth rates the obtained curves deviate from the the-oretical one towards a steeper decay

The corresponding results for POK are shown in Fig-ure 6 Again sending and receiving are very similar The

152 The European Physical Journal B

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) QX messages send (b) QX messages receive

Fig 5 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof QX (a) Sending and (b) receiving The times for m0 andm1 have been chosen as t0 = T2 and t1 = T The symbolscorrespond to different initial number of messages m0 The axisare scaled assuming a distribution according to equation (8)with s = 1 which then corresponds to the dotted lines

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) POK messages send (b) POK messages receive

Fig 6 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof POK (a) Sending and (b) receiving Analogous to Figure 5

curves collapse reasonably but in contrast to QX here themeasured p(r|m0) overall deviate from the theoretical onecomprising less steep slopes

We argue that as for single time series distributionand correlation properties are in most cases independentthe same holds for the message data and the growth Thedistribution of growth rates p(r|m0) seems to be indepen-dent from the long-term correlations which are reflectedin σ(m0) with the exponent β

322 Mutual growth in the number of messages

Next we study a variation of growth Instead of consider-ing the absolute number of messages a member sends westudy the difference in the number of messages comparedto any other member the mutual difference mi(t)minusmj(t)Thus the growth rate is defined analogous to equation (3)

rtimes = lnmi

1 minus mj1

mi0 minus mj

0

(9)

where now there is a growth rate for every pair of mem-bers i and j The conditional average growth rate and thecorresponding standard deviation is then taken over allpossible pairs and the condition is the difference at t0

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

ltr x(

m0i minus

m0j )gt

σ(m

0i minusm

0j )

(a) QX messages send (b) QX messages send shuf

~ (m0

iminusm0

j)

minus03

~ (m0

iminusm0

j)

minus12

Fig 7 (Color online) Average mutual growth rate and stan-dard deviation versus foregoing difference in the number ofmessages for sending in QX The average (open squares) andstandard deviation (filled circles) of the mutual growth ratertimes equation (9) are plotted conditional to the initial differ-ence mi

0 minusmj0 whereas t0 = T2 and t1 = T (a) Original data

and (b) shuffled data The dotted line in (a) corresponds tothe exponent βtimes = 03 and in (b) to βtimes = 12

mi0 minus mj

0 = mi(t0) minus mj(t0) providing the quantities〈rtimes(mi

0 minus mj0)〉 and σ(mi

0 minus mj0) We disregard combina-

tions of i and j where mi0 minus mj

0 = 0 or mi1minusmj

1

mi0minusmj

0le 0

The results for sending in QX are shown in Figure 7Apart from a small decrease up to mi

0 minus mj0 asymp 50 the

average growth rate is constant (Fig 7a) The conditionalstandard deviation asymptotically follows a slope βtimes asymp03 with deviations to small exponents for small mi

0 minusmj0

In the case of the shuffled data (Fig 7b) as expected theaverage growth rate is constant while the standard devia-tion decreases steeper than for the original data namelywith βtimesrnd 12 although not with a nice straight lineNevertheless we conclude that the scaling of the standarddeviation in Figure 7a must be due to temporal correla-tions between the members The growth of the differencebetween their number of messages comprises similar scal-ing as the individual growth

We conjecture that σ(mi0 minus mj

0) reflects long-term cross-correlations in analogy to σ(m0) for auto-correlations However so far we are not able to providefurther evidence for this analogy and the correspondingrelation to β = 1 minus H equation (7) since an appropriatetechnique for the direct quantification of long-term cross-correlations is lacking

33 Modeling

In what follows we propose numerical simulations withthe purpose of testing the methods and empirical pat-terns we found We study three approaches adopted tothe modeling of human activity (a) peaks over thresh-olds (b) preferential attachment [50] and (c) cascadingPoisson process [27]

331 Peaks over threshold (POT) simulations

Our finding that the activity of sending messages ex-hibits long-term persistence asserts the existence of an

D Rybski et al Communication activity in social networks growth and correlations 153

0 5 10 15 20time [w]

0

2

4

6

0 50 100 150 200time

minus2

minus1

0

1

2

3(a)

(b)

(c)

(d)

q

Fig 8 (Color online) Illustration of the peaks over thresh-old simulations (a) An underlying and unknown long-termcorrelated process determines the instantaneous probabilityof sending messages Once this state passes certain thresh-old q (dashed orange line) a messages is sent (green diamonds)(b) Generated instants of messages (c) with windows for ag-gregation such as messages per day (d) Aggregated record ofmessages in windows of size w here w = 10

underlying long-term correlated process It can be under-stood as an unknown individual state driven by variousinternal and external stimuli [274351ndash54] increasing theprobability to send messages Generating such a hypothet-ical long-term correlated internal process (xi) simulatedmessage data can be defined by the instants at which thisinternal process exceeds a threshold q (peaks over thresh-old POT) see [55ndash57] and references therein

More precisely we consider a long-term correlated se-quence (xi) consisting of Nlowast random numbers that is nor-malized to zero average (〈x〉 = 0) and unit standard devi-ation (σx = 1) Choosing a threshold q at each instant ithe probability to send a messages is

psnd =

1 for xi gt q

0 for xi le q (10)

Thus the message events are given by the indices i ofthose random numbers xi exceeding q

Figure 8a illustrates the procedure The random num-bers are plotted as brown circles and the events exceedingthe threshold (orange dashed line) by the green diamondsThe resulting instants are depicted in Figure 8b represent-ing the simulated messages The threshold approximatelypredefines the total number of events and accordingly theaverage inter-event time Using normal-distributed num-bers (xi) the number of eventsmessages is approximatelygiven by the length Nlowast and the inverse cumulative dis-tribution function associated with the standard normaldistribution (probit-function) Additionally the randomnumbers we use are long-term correlated with variable

100

101

102

103

104

Δt [w]

10minus3

10minus2

10minus1

100

101

102

103

104

FD

FA

2(Δt

)

100

101

102

103

104

m0

10minus1

100

ltr(

m0)

gt

100

101

102

103

104

105

106

total number of events M

0

02

04

06

08

1

12

HD

FA

2 (

1000

lt=

s lt

= 1

0000

)q=10q=15q=40q=45

100

101

102

103

104

m0

10minus2

10minus1

100

σ(m

0)

H=09H=08H=07H=06H=05slope β=1minusH

~ (Δt)09

~ (Δt)05

(a) (b)

(c) (d)

Fig 9 (Color online) Results of numerical simulations(a) Mean growth rate conditional to the number of events un-til t0 = Nlowast2 as obtained from 100 000 long-term correlatedrecords of length Nlowast = 131 072 with variable imposed fluctua-tion exponent Himp between 12 and 09 and random thresh-old q between 10 and 60 (b) As before but standard devia-tion conditional to the number of events The solid lines repre-sent power-laws with exponents β expected from the imposedlong-term correlations according to equation (7) (c) Long-termcorrelations in the sequences of aggregated peaks over thresh-old For every threshold q between 10 (violet) and 45 (black)100 normalized records of length Nlowast = 4194 304 have beencreated with Himp = 09 The events are aggregated in win-dows of size w = 100 The panel shows the averaged DFA2fluctuation functions (d) Fluctuation exponents on the scales1 000 le s le 10 000 as a function of the total number of events

fluctuation exponent We impose these auto-correlationsusing Fourier filtering method [2358] Next we show thatthis process reproduces the scaling in the growth ieGGL as well as the variable long-term correlations in theactivity of the members (eg Figs 2 and 3)

For testing this process we create 100 000 independentlong-term correlated records (xi) of length Nlowast = 131 072impose the fluctuation exponent Himp and choose for eachone a random threshold q between 1 and 6 each represent-ing a sender Extracting the peaks over threshold we ob-tain the events and determine for each recordmember thegrowth in the number of eventsmessages between Nlowast2and Nlowast This is for each recordmember we count thenumbers of eventsmessages m0 until t0 = ti=Nlowast2 as wellas m1 until t1 = ti=Nlowast and calculate the growth rate ac-cording to equation (3) We then calculate the conditionalaverage 〈r(m0)〉 and the conditional standard deviationσ(m0) where the values of m0 are binned logarithmicallyThe quantities are plotted in Figures 9a and 9b while inpanel (b) we include slopes expected from β = 1 minus H equation (7) We find that the numerical results reason-ably agree with the prediction (solid lines) Except forsmall m0 these results are consistent with those found inthe original message data

154 The European Physical Journal B

The fluctuation functions can be studied in the sameway As described in Section 31 we find long-term corre-lations in the sequences of messages per day or per weekOn the basis of the above explained simulated messageswe analyze them in an analogous way For each thresh-old q = 10 15 40 45 we create 100 long-term cor-related records of length Nlowast = 4 194 304 with imposedfluctuation exponent Himp = 09 extract the simulatedmessage events and aggregate them in non-overlappingwindows of size w = 100 This is tiling Nlowast in segmentsof size w and counting the number of events occurring ineach segment (Figs 8c and 8d) The obtained aggregatedrecords represent the analogous of messages per day or perweek and are analyzed with DFA averaging the fluctua-tion functions among those configurations with the samethreshold and thus similar number of total events Thecorresponding results are shown in Figure 9c and 9d Weobtain very similar results as in the original data We findvanishing correlations for the sequences with few events(large q) and pronounced long-term correlations for thecases of many events (small q) while the maximum fluc-tuation exponent corresponds to the chosen Himp Thiscan be understood by the fact that for q close to zerothe sequence of number of events per window convergesto the aggregated sequence of 0 or 1 (for x le 0 or x gt 0)reflecting the same long-term correlation properties as theoriginal record [59] For a large threshold q too few eventsoccur to measure the correct long-term correlations egthe true scaling only turns out on larger unaccessible timescales requiring larger w and longer records

Although the simulations do not reveal the originof the long-term correlated patchy behavior they sup-port equation (7) and the concept of an underlying long-term correlated process Consistently an uncorrelatedcompletely random underlying process recovers Poissonstatistics and therefore βrnd = 12 for the growth fluctu-ations as well as uncorrelated message activity (Hrnd =12) For 12 lt H lt 1 it has been shown [5557] that theinter-event times follow a stretched exponential distribu-tion see also [131460]

332 Preferential attachment

Next we compare our findings with the growth proper-ties of a network model We investigate the Barabasi-Albert (BA) model which is based on preferential at-tachment and has been introduced to generate a kind ofscale-free networks [5061] with power-law degree distri-bution p(k) [6263] Essentially it consists of subsequentlyadding nodes to the network by linking them to existingnodes which are chosen randomly with a probability pro-portional to their degree

We obtain the undirected network and study the de-gree growth properties by calculating the conditional av-erage growth rate 〈rBA(k0)〉 and the conditional standarddeviation σBA(k0) obtained from the scale-free BA modelThe times t0 and t1 are defined by the number of nodesattached to the network

Figure 10 shows the results where an average degree〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1 were

101

102

103

degree k0

10minus2

10minus1

ltr B

A(k

0)gt

σB

A(k

0)

ltrBA(k0)gtσBA(k0)

~ k0

minus12

Fig 10 (Color online) Average degree growth rate and stan-dard deviation versus foregoing degree for the preferential at-tachment network model [50] The average (green open di-amonds) and standard deviation (blue filled circles) of thegrowth rate rBA are plotted conditional to k0 the degree ofthe corresponding nodes at the first stage We choose averagedegree 〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1The error-bars are taken from 10 configurations The dashedline in the bottom corresponds to βBA = 12

chosen We find constant average growth rate that doesnot depend on the initial degree k0 The conditional stan-dard deviation is a function of k0 and exhibits a power-lawdecay with βBA = 12 as expected for such an uncorre-lated growth process [12] Therefore a purely preferentialattachment type of growth is not sufficient to describe thetype of social network dynamics found in Section 321since additional temporal correlations are involved in thedynamics of establishing acquaintances in the community

The value βBA = 12 in equation (7) corresponds toH = 12 indicating complete randomness There is nomemory in the system Since each addition of a new nodeis completely independent from precedent ones there can-not be temporal correlations in the activity of addinglinks In contrast for the out-degree in QX and POK weobtained βkQX = 022plusmn002 and βkPOK = 017plusmn008 [12]which is supported by the (non-linear) correlations be-tween the number of messages and the out-degree as pre-sented in Section 34

Interestingly an extension of the standard BA modelhas been proposed [64] see also [6566] that takes intoaccount different fitnesses of the nodes to acquiring linksWe think that such fitness could be related to growth fluc-tuations thus providing a route to modify the BA modelto include the long-term correlated dynamics found here

333 Cascading Poisson process

In this section we elaborate the model proposed in [27]and examine it with respect to long-term correlations Themodel is based on a cascading Poisson process (CPP)according to which the probability that a member entersan active interval is ρ(t) = Nwpd(t)pw(t) where Nw is theaverage number of active intervals per week pd(t) is the

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 2: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

148 The European Physical Journal B

activity in the social networks Here we further study thedistribution of the logarithmic growth rates and find ex-ponential decays similarly to those encountered in econo-physics [1]

Third in order to understand the relation between thelong-term correlations and growth fluctuations we pro-pose a simulation approach based on peaks over thresh-old modeling Using artificially generated long-term corre-lated sequences a message is sent when the record exceedsa predefined threshold Numerically we measure the long-term correlations characterized by the exponent H as wellas growth fluctuations characterized by β and find thatthe relation connecting both exponents proposed in [12]holds

Fourth we introduce a new growth rate between anypair of members quantifying the mutual growth in thenumber of messages We find that the correspondinggrowth fluctuations follow a power-law with similar expo-nent as for the lsquonormalrsquo growth rates We motivate thatthe exponent might be related to cross-correlations in theactivity of the members

Fifth in addition to the temporal correlations we in-vestigate the total number of messages sent or receivedand the total in- and out-degree (ie the number of differ-ent members from which a member receives or to whomheshe sends) We find that the total degree and the finalnumber of messages are correlated following a power-lawwith exponent close to 075 In the case of final in- vs out-degree deviations from the linear correlations are found

Finally we point out that there is also a relation be-tween our results on growth fluctuations and long-rangecorrelations (β and H respectively) and the existence ofpower-law distributed inter-event times characterized bythe exponent δ [13] leading to the clustering and bursts inthe activity of members This connection is explored in afollow-up paper [14]

Our results have important implications for the de-sign of communication systems The correlations can beelaborated to better predict information propagation seeeg [15] In addition the characterization of fluctuations isessential for the knowledge of uncertainty Our approachcould be also applied in natural systems such as in thecontext of protein unfolding [16]

This paper is organized as follows In Section 2 webriefly describe the data of messages sent in two onlinecommunities Our results are presented in Section 3 whichis organized in four sub-sections ndash discussing long-termcorrelations growth fluctuations modeling and other cor-relations Finally we draw our conclusions in Section 4

2 Data

We analyze the timing of messages sent in two Internetcommunities [1217] The data of the first online com-munity (wwwqxse QX)1 consists of over 80 000 mem-

1 The study of the de-identified dating site network data wasapproved by the Regional Ethical Review board in Stockholmrecord 200553

bers and more than 125 million messages sent during63 days (mid November 2005 until mid January 2006) Thedata of the second online community (wwwpussokramcom POK) covers 492 days (February 2001 until June2002) of activity with more than 500 000 messages sentamong almost 30 000 members [18ndash20] This correspondsto the entire lifespan of the social network Both web-sitesare used for dating and general social interactions TheQX community is used mainly by Swedish gay and les-bian while POK was targeted to Swedish teenagers andyoung adults All data are completely anonymous lackany message content and consist only of the time whenthe messages are sent and identification numbers of thesenders and receivers The advantage of these data sets isthat they provide the exact time when the messages weresent ndash in contrast to similar network data sets consistingonly of snapshots ie temporally aggregated social net-works expressing who sent messages to whom (see [17] fora discussion)

Similarly to other online communities the memberscan log in and meet virtually There are different ways ofinteracting in these communities Common among mostof such online communities is the possibility to choosefavorites ie a list of other members that a person some-how feels committed to In addition the platforms offerthe possibility to join groups and discuss with other mem-bers about specific topics We focus on the messages sentamong the members These messages are similar to e-mailsbut have the advantage that they are sent within a closedcommunity where there are no messages coming from orgoing outside

From the message data one can also build networkswhich consist of links connecting nodes We consider themembers as nodes and set a directed link from node ato b when member a sends at least one message to bThe degree k of a node is the number of other nodes it isconnected to ie the number of links it has In the directedcase one distinguishes between out-degree (number of out-going links) and in-degree (number of in-going links)

3 Analysis

31 Long-term correlations

First we define the activity record μj(t) counting thenumber of messages member j sends at dayweek t Thuswe study the activity that is aggregated at the daily orweekly level This is done to avoid possible oscillationsthat are observed in the data at both frequencies

In a previous study [12] we have applied detrendedfluctuation analysis (DFA) [21ndash23] and found that theactivity records μ(t) exhibit long-term correlationswhich are characterized by a power-law decaying auto-correlation function

C(Δt) =1σ2

μ

〈[μ(t) minus 〈μ(t)〉] [μ(t + Δt) minus 〈μ(t)〉]〉

sim (Δt)minusν

D Rybski et al Communication activity in social networks growth and correlations 149

101

102

103

Δt [days]

10minus1

100

101

102

FD

FA

2(Δt

) (

wee

kly

reso

lutio

n)

101

102

103

Δt [days]

10minus1

100

101

102

FD

FA

2(Δt

) (

daily

res

olut

ion)

~ (Δt)1

~ (Δt)12

~ (Δt)12

~ (Δt)1

(a) daily resolution (b) weekly resolutionPOK POK

Fig 1 (Color online) Comparison of fluctuation functions in(a) daily and (b) weekly resolution of members sending mes-sages in POK The different curves correspond to different ac-tivity levels M = 1minus2 3ndash7 8ndash20 21ndash54 55ndash148 149ndash403404ndash1096 1097ndash2980 total messages (from bottom to top) Thecurves in (b) have been shifted along the Δt axis to match dailyresolution In both cases the asymptotic scaling is the sameThe dotted lines correspond to the exponents H = 1 (top) andH = 12 (bottom)

where 〈μ(t)〉 is the average of the record μ(t) σμ is itsstandard deviation and ν is the correlation exponent(1 ge ν ge 0) The fluctuation function provided by DFAscales as

F (Δt) sim (Δt)H (1)

where the exponent H is similar to the Hurst exponent(12 le H le 1 larger exponents correspond to more pro-nounced long-term correlations) It is related to the cor-relation exponent via

ν = 2 minus 2H (2)

For uncorrelated or short-term correlated records theasymptotic fluctuation exponent is H = 12 (for a reviewwe refer to [24])

In order to study the activity with respect to long-termcorrelations we apply second order DFA (DFA2) [2223](linear detrending of μj(t)) and obtain the fluctuationfunctions F j

DFA2(Δt) (details can be found in [12]) Sincethe activity records of the individual members are tooshort we average the squared fluctuation functions amongmembers with similar overall activity (ie total num-ber of messages M) F (Δt) = [ 1

nM

sumj|M (F j(Δt))2]12

where nM is the number of members with M messagesTherefore we employ logarithmic bins in M The activitydistributions are discussed in Section 341

In Figure 1 we compare for sending in POK the fluc-tuation functions in daily resolution (Fig 1a) and weeklyresolution (Fig 1b) In order to match the scales we haveshifted the curves in Figure 1b along the Δt-axis Natu-rally in daily resolution the fluctuation functions covermore scales The asymptotic scaling is in both cases thesame namely no correlations in the case of least activemembers and strong long-term correlations with fluctua-tion exponents close to 1 for the most active membersMoreover for POK in daily resolution the fluctuationfunctions exhibit an increase from small slopes on shorttime scales to larger slopes on large scales This indicatesthat the long-term correlations do not vanish after certain

100

101

102

103

104

num of total messages sent M

04

06

08

1

12

HD

FA

2 (

10lt

=slt

=63

)

send

realshuffled individual sequences of messages per day

101

102

103

104

num of total messages received M

receive

(a) QX (b) QX

Fig 2 (Color online) Fluctuation exponents of the commu-nication activity (a) sending and (b) receiving messages bymembers of QX The exponents are plotted as a function of theactivity level M ie total number of messages for the originaldata (green circles) and individually shuffled sequences (or-ange diamonds) See also [12]

scale but the opposite the long-term correlations becomestronger Note that we use weekly resolution in order tocope with possible weekly oscillations [25ndash27]

We measure the fluctuation exponents by applyingleast squares fits to log F (Δt) vs log Δt on the scales10 lt Δt lt 63 days (QX) and 10 lt Δt lt 70 weeks (POK)For the former case the obtained fluctuation exponents areplotted in Figure 2 as a function of the members activitylevel ie their total number of messages M For sending(panel (a)) the less active members exhibit uncorrelatedbehavior The more messages the members send overallthe stronger correlated is their activity The fluctuationexponent HQX increases with M and reaches values up to075 plusmn 005 (sending) In contrast for the shuffled datathe fluctuation exponents are always very close to 12This confirms that the long-term correlations are due tothe temporal structure of the times each member sendshisher messages see also [14] For receiving messagesFigure 2b we find almost identical results The error barsin Figure 2 were calculated by subdividing the groups ofdifferent activity level The size of the error bars is simplythe standard deviation of the corresponding exponents

The estimated fluctuation exponents obtained forPOK are displayed in Figure 3 Qualitatively we obtaina similar picture as for QX However in contrast to QXhere the original records achieve larger fluctuation expo-nents up to 091 plusmn 004 (sending) disregarding the lastpoints which carry large error-bars A possible reason forthese different maximum exponents could be that in thecase of POK the data covers a much longer period of dataacquisition and possible non-stationarities [28] In QXthe members might not have had enough time to exhibitthe full extend of their persistence while in POK we followthe entire evolution of the online community

Indeed similar behavior of long-term correlations havebeen found in traded values of stocks and e-mail commu-nication [2930] where the fluctuation exponent increasesin an analogous way with the mean trading activity of the

150 The European Physical Journal B

100

101

102

103

104

num of total messages sent M

04

06

08

1

12

HD

FA

2 (

10lt

=slt

=70

)

send

realshuffled individual sequences of messages per day

101

102

103

104

num of total messages received M

receive

(a) POK (b) POK

Fig 3 (Color online) Fluctuation exponents of the commu-nication activity (a) sending and (b) receiving messages bymembers of POK (weekly resolution) The exponents are plot-ted as a function of the activity level M for the original data(green circles) and individually shuffled sequences (orange di-amonds) See also [12]

corresponding stock or with the average number of e-mails(see also [31])

Apart from these for human related data long-term persistence has been reported for physiologicalrecords [223233] written language [34] or for recordsgenerated by collective behavior such as finance and econ-omy [35ndash37] Ethernet traffic [38] Wikipedia access [39]as well as highway traffic [4041] There are also indicationsof long-term correlations in human brain activity [4243]and human motor activity [44]

A question that arises is why the fluctuation expo-nent (in Figs 2 and 3) depends on the activity levelof the members that is why the least active membersexhibit no persistence while the most active members ex-hibit strong persistence We argue that if only few mes-sages appear in the whole period of data acquisition long-term persistence cannot be reflected In these cases it isquite possible that much longer records and higher ag-gregation level such as months or years would be neededto reveal the persistence But doing so there would beother members with even less messages which then againwould probably appear with seemingly uncorrelated mes-sage signals Thus we propose that the exponents of thelargest activity reflect more accurately the scaling behav-ior of human communication activity In Section 331we propose statistical simulations to generate data usingpeaks over threshold (POT) and find that it supports thisperception

At this point we need to mention that long-term cor-relations can be related to broad inter-event time dis-tributions ie the times between successive messages ofindividual members Such distributions have been inves-tigated see eg [1345] but there is no consensus on thefunctional form We study the inter-event time distribu-tions in a different publication [14] where we demonstratethe connection with the long-term correlations found here

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

100

101

102

103

104

total number of messages ML

04

06

08

1

12

HD

FA

2 (

10 lt

= s

lt=

63)

(a) QX (b) QX

~ (Δt)075

~ Δt12

Fig 4 (Color online) Temporal correlations in the dailyamount of messages on directed links in QX (a) DFA2 fluctu-ation functions versus the time scale Δt averaged conditionalto the final number of messages of each link The differentcurves correspond to different activity levels ML = 1minus2 3ndash7 8ndash20 21ndash54 55ndash148 149ndash403 404ndash1096 1097ndash2980 (frombottom to top) The dotted lines correspond to the exponentsH = 075 (top) and H = 12 (bottom) (b) The DFA2 fluctua-tion exponent HLQX obtained from (a) is plotted as a functionof the activity level ML The exponents were obtained in therange of scales 10 le Δt le 63 days The activity along directedlinks comprise similar long-term correlations as the total ac-tivity of individual members to all of their acquaintances

Along directed links

In Figure 4 we study for QX the long-term correlationsin activity not on the sender or receiver (node) level buton the level of messages along directed links This meansthat we track when a message is sent directed between twomembers but separately for any pair of members such asararrb brarra ararrd Accordingly we determine the activ-ity records μab(t) μba(t) μad(t) etc expressing how manymessages have been sent each dayweek t between anypair of members Analogous there is also an activity levelfor the links M ab

L (we disregard those pairings with-out activity) Then we perform the analogous analysis forlong-term correlations by applying DFA2 and averagingamong pairings with similar overall activity (the distribu-tions of activity are discussed in Sect 341) The fluctua-tion functions in Figure 4a have asymptotic slopes close to12 for those links with few total number of messages Incontrary those links with many total number of messagesexhibit long-term correlations with exponents up to 074The fluctuation exponents as a function of the activitylevel ML are plotted in Figure 4b Apart from the factthat by definition the number of messages on the most ac-tive links is lower (or equal) than the number of messagesof the most active members the curve looks very similarto the one in Figure 2a in particular the maximum expo-nents are quite similar (HQX asymp 075 and HLQX asymp 074)This indicates that the persistence in the communicationmay be dominated by the communication activity withthe most liked partners

In [46] a different concept of persistence links has beeninvestigated The period of data acquisition is partitionedinto time slices in each of which a network is built Thenthe persistence is defined as the normalized number oftime slices in which a certain link appears However theapproach of our work is not compatible with the one in [46]and we cannot directly compare the results

D Rybski et al Communication activity in social networks growth and correlations 151

32 Growth process

321 Growth in the number of messages

As suggested in [12] we also analyze the growth prop-erties of the message activity This concept is borrowedfrom econophysics where the growth of companies hasbeen found to exhibit non-trivial scaling laws [1] that inparticular violate the original Gibratrsquos law [9ndash1147] andat the same time represents a generalized Gibratrsquos law(GGL) [12] In the present study each member is con-sidered as a unit and the number of messages sent or re-ceived since the beginning of data acquisition representsits size We analyze the growth in the number of mes-sages in analogy to other systems such as the growth ofcompanies [148] or the growth of cities [749] The anal-ogy is supported by some aspects (i) the members of acommunity represent a population similar to the popula-tion of a country (ii) the number of members fluctuatesand typically grows analogous to the number of cities of acountry (iii) the activity or number of links of individualsfluctuates and grows similar to the size of cities

The cumulative number mj(t) expresses how manymessages have been sent by a certain member j up to agiven time t (for a better readability we will not writethe index j explicitly m(t)) We consider the evolution ofm(t) between times t0 and t1 within the period of dataacquisition T (t0 lt t1 le T ) as a growth process whereeach member exhibits a specific growth rate rj (r for shortnotation)

r = lnm1

m0 (3)

where m0 equiv m(t0) and m1 equiv m(t1) are the number ofmessages sent until t0 and t1 respectively by every mem-ber To characterize the dynamics of the activity we con-sider two measures (i) The conditional average growthrate 〈r(m0)〉 quantifies the average growth of the num-ber of messages sent by the members between t0 and t1depending on the initial number of messages m0 In otherwords we consider the average growth rate of only thosemembers that have sent m0 messages until t0 (ii) Theconditional standard deviation of the growth rate for thosemembers that have sent m0 messages until t0

σ(m0) equivradic

〈(r(m0) minus 〈r(m0)〉)2〉 (4)

expresses the statistical spread or fluctuation of growthamong the members depending on m0 Both quanti-ties are relevant in the context of Gibratrsquos law in eco-nomics [9ndash1147] which proposes a proportionate growthprocess entailing the assumption that the average and thestandard deviation of the growth rate of a given economicindicator are constant and independent of the specific in-dicator value That is both 〈r(m0)〉 and σ(m0) are inde-pendent of m0

As shown in [12] for the message data the conditionalaverage growth rate is almost constant and only decreasesslightly

〈r(m0)〉 sim mminusα0 (5)

with an exponent α asymp 005 This means that memberswith many messages in average increase their number ofmessages almost with the same rate as members with fewmessages In contrast the conditional standard deviationclearly decreases with increasing m0

σ(m0) sim mminusβ0 (6)

where t0 = T2 is optimal in terms statistics In this casethe exponents βQX = 022 plusmn 001 and βPOK = 017 plusmn003 for sending messages were found [12] This meansalthough the average growth rate almost does not dependon m0 the conditional standard deviation of the growthof members with many messages is smaller than the one ofmembers with few messages Due to weaker fluctuationsactive members are relatively better predictable in theiractivity of sending messages

It has been shown that the fluctuation exponent H andthe growth fluctuation exponent β are related via [12]

β = 1 minus H (7)

Equation (7) is a scaling law formalizing the relation be-tween growth and long-term correlations in the activ-ity According to equation (7) the original Gibratrsquos law(βG = 0) corresponds to very strong long-term correla-tions with HG = 1 In contrast βrnd = 12 representscompletely random activity (Hrnd = 12) The observedmessage data comprises 12 gt β gt 0 and 12 lt H lt 1Surprisingly the values of β found here are very close tothe β values found for companies in the US economy [1]

In the case of companies also the distribution ofgrowth rates has been studied It was found that the dis-tribution density follows [1]

p(r|m0) =1

sσ(m0)exp

(

minuss|r minus 〈r(m0)〉|σ(m0)

)

(8)

whereas s =radic

2 Next we analyze how the growth rates rare distributed in the case of the message data

First we need to point out that in contrast to thegrowth of companies our entities can never shrink Themembers cannot loose messages the number m(t) eitherincreases or remains the same Accordingly in our caser ge 0 and therefore s = 1 as can be derived for thesingle-sided exponentially decaying distribution

Figure 5 shows p(r|m0) for QX where the values arescaled to collapse according to equation (8) with s = 1In order to have reasonable statistics we define the con-dition m0 in rather wide ranges namely according to thedecimal logarithm For sending (Fig 5a) and receiving(Fig 5b) messages the scaled probability densities col-lapse and are quite similar Nevertheless the growth ratesdo not exactly follow equation (8) with s = 1 While forthe less active members with small growth rates we finda good agreement for more active members and largegrowth rates the obtained curves deviate from the the-oretical one towards a steeper decay

The corresponding results for POK are shown in Fig-ure 6 Again sending and receiving are very similar The

152 The European Physical Journal B

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) QX messages send (b) QX messages receive

Fig 5 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof QX (a) Sending and (b) receiving The times for m0 andm1 have been chosen as t0 = T2 and t1 = T The symbolscorrespond to different initial number of messages m0 The axisare scaled assuming a distribution according to equation (8)with s = 1 which then corresponds to the dotted lines

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) POK messages send (b) POK messages receive

Fig 6 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof POK (a) Sending and (b) receiving Analogous to Figure 5

curves collapse reasonably but in contrast to QX here themeasured p(r|m0) overall deviate from the theoretical onecomprising less steep slopes

We argue that as for single time series distributionand correlation properties are in most cases independentthe same holds for the message data and the growth Thedistribution of growth rates p(r|m0) seems to be indepen-dent from the long-term correlations which are reflectedin σ(m0) with the exponent β

322 Mutual growth in the number of messages

Next we study a variation of growth Instead of consider-ing the absolute number of messages a member sends westudy the difference in the number of messages comparedto any other member the mutual difference mi(t)minusmj(t)Thus the growth rate is defined analogous to equation (3)

rtimes = lnmi

1 minus mj1

mi0 minus mj

0

(9)

where now there is a growth rate for every pair of mem-bers i and j The conditional average growth rate and thecorresponding standard deviation is then taken over allpossible pairs and the condition is the difference at t0

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

ltr x(

m0i minus

m0j )gt

σ(m

0i minusm

0j )

(a) QX messages send (b) QX messages send shuf

~ (m0

iminusm0

j)

minus03

~ (m0

iminusm0

j)

minus12

Fig 7 (Color online) Average mutual growth rate and stan-dard deviation versus foregoing difference in the number ofmessages for sending in QX The average (open squares) andstandard deviation (filled circles) of the mutual growth ratertimes equation (9) are plotted conditional to the initial differ-ence mi

0 minusmj0 whereas t0 = T2 and t1 = T (a) Original data

and (b) shuffled data The dotted line in (a) corresponds tothe exponent βtimes = 03 and in (b) to βtimes = 12

mi0 minus mj

0 = mi(t0) minus mj(t0) providing the quantities〈rtimes(mi

0 minus mj0)〉 and σ(mi

0 minus mj0) We disregard combina-

tions of i and j where mi0 minus mj

0 = 0 or mi1minusmj

1

mi0minusmj

0le 0

The results for sending in QX are shown in Figure 7Apart from a small decrease up to mi

0 minus mj0 asymp 50 the

average growth rate is constant (Fig 7a) The conditionalstandard deviation asymptotically follows a slope βtimes asymp03 with deviations to small exponents for small mi

0 minusmj0

In the case of the shuffled data (Fig 7b) as expected theaverage growth rate is constant while the standard devia-tion decreases steeper than for the original data namelywith βtimesrnd 12 although not with a nice straight lineNevertheless we conclude that the scaling of the standarddeviation in Figure 7a must be due to temporal correla-tions between the members The growth of the differencebetween their number of messages comprises similar scal-ing as the individual growth

We conjecture that σ(mi0 minus mj

0) reflects long-term cross-correlations in analogy to σ(m0) for auto-correlations However so far we are not able to providefurther evidence for this analogy and the correspondingrelation to β = 1 minus H equation (7) since an appropriatetechnique for the direct quantification of long-term cross-correlations is lacking

33 Modeling

In what follows we propose numerical simulations withthe purpose of testing the methods and empirical pat-terns we found We study three approaches adopted tothe modeling of human activity (a) peaks over thresh-olds (b) preferential attachment [50] and (c) cascadingPoisson process [27]

331 Peaks over threshold (POT) simulations

Our finding that the activity of sending messages ex-hibits long-term persistence asserts the existence of an

D Rybski et al Communication activity in social networks growth and correlations 153

0 5 10 15 20time [w]

0

2

4

6

0 50 100 150 200time

minus2

minus1

0

1

2

3(a)

(b)

(c)

(d)

q

Fig 8 (Color online) Illustration of the peaks over thresh-old simulations (a) An underlying and unknown long-termcorrelated process determines the instantaneous probabilityof sending messages Once this state passes certain thresh-old q (dashed orange line) a messages is sent (green diamonds)(b) Generated instants of messages (c) with windows for ag-gregation such as messages per day (d) Aggregated record ofmessages in windows of size w here w = 10

underlying long-term correlated process It can be under-stood as an unknown individual state driven by variousinternal and external stimuli [274351ndash54] increasing theprobability to send messages Generating such a hypothet-ical long-term correlated internal process (xi) simulatedmessage data can be defined by the instants at which thisinternal process exceeds a threshold q (peaks over thresh-old POT) see [55ndash57] and references therein

More precisely we consider a long-term correlated se-quence (xi) consisting of Nlowast random numbers that is nor-malized to zero average (〈x〉 = 0) and unit standard devi-ation (σx = 1) Choosing a threshold q at each instant ithe probability to send a messages is

psnd =

1 for xi gt q

0 for xi le q (10)

Thus the message events are given by the indices i ofthose random numbers xi exceeding q

Figure 8a illustrates the procedure The random num-bers are plotted as brown circles and the events exceedingthe threshold (orange dashed line) by the green diamondsThe resulting instants are depicted in Figure 8b represent-ing the simulated messages The threshold approximatelypredefines the total number of events and accordingly theaverage inter-event time Using normal-distributed num-bers (xi) the number of eventsmessages is approximatelygiven by the length Nlowast and the inverse cumulative dis-tribution function associated with the standard normaldistribution (probit-function) Additionally the randomnumbers we use are long-term correlated with variable

100

101

102

103

104

Δt [w]

10minus3

10minus2

10minus1

100

101

102

103

104

FD

FA

2(Δt

)

100

101

102

103

104

m0

10minus1

100

ltr(

m0)

gt

100

101

102

103

104

105

106

total number of events M

0

02

04

06

08

1

12

HD

FA

2 (

1000

lt=

s lt

= 1

0000

)q=10q=15q=40q=45

100

101

102

103

104

m0

10minus2

10minus1

100

σ(m

0)

H=09H=08H=07H=06H=05slope β=1minusH

~ (Δt)09

~ (Δt)05

(a) (b)

(c) (d)

Fig 9 (Color online) Results of numerical simulations(a) Mean growth rate conditional to the number of events un-til t0 = Nlowast2 as obtained from 100 000 long-term correlatedrecords of length Nlowast = 131 072 with variable imposed fluctua-tion exponent Himp between 12 and 09 and random thresh-old q between 10 and 60 (b) As before but standard devia-tion conditional to the number of events The solid lines repre-sent power-laws with exponents β expected from the imposedlong-term correlations according to equation (7) (c) Long-termcorrelations in the sequences of aggregated peaks over thresh-old For every threshold q between 10 (violet) and 45 (black)100 normalized records of length Nlowast = 4194 304 have beencreated with Himp = 09 The events are aggregated in win-dows of size w = 100 The panel shows the averaged DFA2fluctuation functions (d) Fluctuation exponents on the scales1 000 le s le 10 000 as a function of the total number of events

fluctuation exponent We impose these auto-correlationsusing Fourier filtering method [2358] Next we show thatthis process reproduces the scaling in the growth ieGGL as well as the variable long-term correlations in theactivity of the members (eg Figs 2 and 3)

For testing this process we create 100 000 independentlong-term correlated records (xi) of length Nlowast = 131 072impose the fluctuation exponent Himp and choose for eachone a random threshold q between 1 and 6 each represent-ing a sender Extracting the peaks over threshold we ob-tain the events and determine for each recordmember thegrowth in the number of eventsmessages between Nlowast2and Nlowast This is for each recordmember we count thenumbers of eventsmessages m0 until t0 = ti=Nlowast2 as wellas m1 until t1 = ti=Nlowast and calculate the growth rate ac-cording to equation (3) We then calculate the conditionalaverage 〈r(m0)〉 and the conditional standard deviationσ(m0) where the values of m0 are binned logarithmicallyThe quantities are plotted in Figures 9a and 9b while inpanel (b) we include slopes expected from β = 1 minus H equation (7) We find that the numerical results reason-ably agree with the prediction (solid lines) Except forsmall m0 these results are consistent with those found inthe original message data

154 The European Physical Journal B

The fluctuation functions can be studied in the sameway As described in Section 31 we find long-term corre-lations in the sequences of messages per day or per weekOn the basis of the above explained simulated messageswe analyze them in an analogous way For each thresh-old q = 10 15 40 45 we create 100 long-term cor-related records of length Nlowast = 4 194 304 with imposedfluctuation exponent Himp = 09 extract the simulatedmessage events and aggregate them in non-overlappingwindows of size w = 100 This is tiling Nlowast in segmentsof size w and counting the number of events occurring ineach segment (Figs 8c and 8d) The obtained aggregatedrecords represent the analogous of messages per day or perweek and are analyzed with DFA averaging the fluctua-tion functions among those configurations with the samethreshold and thus similar number of total events Thecorresponding results are shown in Figure 9c and 9d Weobtain very similar results as in the original data We findvanishing correlations for the sequences with few events(large q) and pronounced long-term correlations for thecases of many events (small q) while the maximum fluc-tuation exponent corresponds to the chosen Himp Thiscan be understood by the fact that for q close to zerothe sequence of number of events per window convergesto the aggregated sequence of 0 or 1 (for x le 0 or x gt 0)reflecting the same long-term correlation properties as theoriginal record [59] For a large threshold q too few eventsoccur to measure the correct long-term correlations egthe true scaling only turns out on larger unaccessible timescales requiring larger w and longer records

Although the simulations do not reveal the originof the long-term correlated patchy behavior they sup-port equation (7) and the concept of an underlying long-term correlated process Consistently an uncorrelatedcompletely random underlying process recovers Poissonstatistics and therefore βrnd = 12 for the growth fluctu-ations as well as uncorrelated message activity (Hrnd =12) For 12 lt H lt 1 it has been shown [5557] that theinter-event times follow a stretched exponential distribu-tion see also [131460]

332 Preferential attachment

Next we compare our findings with the growth proper-ties of a network model We investigate the Barabasi-Albert (BA) model which is based on preferential at-tachment and has been introduced to generate a kind ofscale-free networks [5061] with power-law degree distri-bution p(k) [6263] Essentially it consists of subsequentlyadding nodes to the network by linking them to existingnodes which are chosen randomly with a probability pro-portional to their degree

We obtain the undirected network and study the de-gree growth properties by calculating the conditional av-erage growth rate 〈rBA(k0)〉 and the conditional standarddeviation σBA(k0) obtained from the scale-free BA modelThe times t0 and t1 are defined by the number of nodesattached to the network

Figure 10 shows the results where an average degree〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1 were

101

102

103

degree k0

10minus2

10minus1

ltr B

A(k

0)gt

σB

A(k

0)

ltrBA(k0)gtσBA(k0)

~ k0

minus12

Fig 10 (Color online) Average degree growth rate and stan-dard deviation versus foregoing degree for the preferential at-tachment network model [50] The average (green open di-amonds) and standard deviation (blue filled circles) of thegrowth rate rBA are plotted conditional to k0 the degree ofthe corresponding nodes at the first stage We choose averagedegree 〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1The error-bars are taken from 10 configurations The dashedline in the bottom corresponds to βBA = 12

chosen We find constant average growth rate that doesnot depend on the initial degree k0 The conditional stan-dard deviation is a function of k0 and exhibits a power-lawdecay with βBA = 12 as expected for such an uncorre-lated growth process [12] Therefore a purely preferentialattachment type of growth is not sufficient to describe thetype of social network dynamics found in Section 321since additional temporal correlations are involved in thedynamics of establishing acquaintances in the community

The value βBA = 12 in equation (7) corresponds toH = 12 indicating complete randomness There is nomemory in the system Since each addition of a new nodeis completely independent from precedent ones there can-not be temporal correlations in the activity of addinglinks In contrast for the out-degree in QX and POK weobtained βkQX = 022plusmn002 and βkPOK = 017plusmn008 [12]which is supported by the (non-linear) correlations be-tween the number of messages and the out-degree as pre-sented in Section 34

Interestingly an extension of the standard BA modelhas been proposed [64] see also [6566] that takes intoaccount different fitnesses of the nodes to acquiring linksWe think that such fitness could be related to growth fluc-tuations thus providing a route to modify the BA modelto include the long-term correlated dynamics found here

333 Cascading Poisson process

In this section we elaborate the model proposed in [27]and examine it with respect to long-term correlations Themodel is based on a cascading Poisson process (CPP)according to which the probability that a member entersan active interval is ρ(t) = Nwpd(t)pw(t) where Nw is theaverage number of active intervals per week pd(t) is the

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 3: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

D Rybski et al Communication activity in social networks growth and correlations 149

101

102

103

Δt [days]

10minus1

100

101

102

FD

FA

2(Δt

) (

wee

kly

reso

lutio

n)

101

102

103

Δt [days]

10minus1

100

101

102

FD

FA

2(Δt

) (

daily

res

olut

ion)

~ (Δt)1

~ (Δt)12

~ (Δt)12

~ (Δt)1

(a) daily resolution (b) weekly resolutionPOK POK

Fig 1 (Color online) Comparison of fluctuation functions in(a) daily and (b) weekly resolution of members sending mes-sages in POK The different curves correspond to different ac-tivity levels M = 1minus2 3ndash7 8ndash20 21ndash54 55ndash148 149ndash403404ndash1096 1097ndash2980 total messages (from bottom to top) Thecurves in (b) have been shifted along the Δt axis to match dailyresolution In both cases the asymptotic scaling is the sameThe dotted lines correspond to the exponents H = 1 (top) andH = 12 (bottom)

where 〈μ(t)〉 is the average of the record μ(t) σμ is itsstandard deviation and ν is the correlation exponent(1 ge ν ge 0) The fluctuation function provided by DFAscales as

F (Δt) sim (Δt)H (1)

where the exponent H is similar to the Hurst exponent(12 le H le 1 larger exponents correspond to more pro-nounced long-term correlations) It is related to the cor-relation exponent via

ν = 2 minus 2H (2)

For uncorrelated or short-term correlated records theasymptotic fluctuation exponent is H = 12 (for a reviewwe refer to [24])

In order to study the activity with respect to long-termcorrelations we apply second order DFA (DFA2) [2223](linear detrending of μj(t)) and obtain the fluctuationfunctions F j

DFA2(Δt) (details can be found in [12]) Sincethe activity records of the individual members are tooshort we average the squared fluctuation functions amongmembers with similar overall activity (ie total num-ber of messages M) F (Δt) = [ 1

nM

sumj|M (F j(Δt))2]12

where nM is the number of members with M messagesTherefore we employ logarithmic bins in M The activitydistributions are discussed in Section 341

In Figure 1 we compare for sending in POK the fluc-tuation functions in daily resolution (Fig 1a) and weeklyresolution (Fig 1b) In order to match the scales we haveshifted the curves in Figure 1b along the Δt-axis Natu-rally in daily resolution the fluctuation functions covermore scales The asymptotic scaling is in both cases thesame namely no correlations in the case of least activemembers and strong long-term correlations with fluctua-tion exponents close to 1 for the most active membersMoreover for POK in daily resolution the fluctuationfunctions exhibit an increase from small slopes on shorttime scales to larger slopes on large scales This indicatesthat the long-term correlations do not vanish after certain

100

101

102

103

104

num of total messages sent M

04

06

08

1

12

HD

FA

2 (

10lt

=slt

=63

)

send

realshuffled individual sequences of messages per day

101

102

103

104

num of total messages received M

receive

(a) QX (b) QX

Fig 2 (Color online) Fluctuation exponents of the commu-nication activity (a) sending and (b) receiving messages bymembers of QX The exponents are plotted as a function of theactivity level M ie total number of messages for the originaldata (green circles) and individually shuffled sequences (or-ange diamonds) See also [12]

scale but the opposite the long-term correlations becomestronger Note that we use weekly resolution in order tocope with possible weekly oscillations [25ndash27]

We measure the fluctuation exponents by applyingleast squares fits to log F (Δt) vs log Δt on the scales10 lt Δt lt 63 days (QX) and 10 lt Δt lt 70 weeks (POK)For the former case the obtained fluctuation exponents areplotted in Figure 2 as a function of the members activitylevel ie their total number of messages M For sending(panel (a)) the less active members exhibit uncorrelatedbehavior The more messages the members send overallthe stronger correlated is their activity The fluctuationexponent HQX increases with M and reaches values up to075 plusmn 005 (sending) In contrast for the shuffled datathe fluctuation exponents are always very close to 12This confirms that the long-term correlations are due tothe temporal structure of the times each member sendshisher messages see also [14] For receiving messagesFigure 2b we find almost identical results The error barsin Figure 2 were calculated by subdividing the groups ofdifferent activity level The size of the error bars is simplythe standard deviation of the corresponding exponents

The estimated fluctuation exponents obtained forPOK are displayed in Figure 3 Qualitatively we obtaina similar picture as for QX However in contrast to QXhere the original records achieve larger fluctuation expo-nents up to 091 plusmn 004 (sending) disregarding the lastpoints which carry large error-bars A possible reason forthese different maximum exponents could be that in thecase of POK the data covers a much longer period of dataacquisition and possible non-stationarities [28] In QXthe members might not have had enough time to exhibitthe full extend of their persistence while in POK we followthe entire evolution of the online community

Indeed similar behavior of long-term correlations havebeen found in traded values of stocks and e-mail commu-nication [2930] where the fluctuation exponent increasesin an analogous way with the mean trading activity of the

150 The European Physical Journal B

100

101

102

103

104

num of total messages sent M

04

06

08

1

12

HD

FA

2 (

10lt

=slt

=70

)

send

realshuffled individual sequences of messages per day

101

102

103

104

num of total messages received M

receive

(a) POK (b) POK

Fig 3 (Color online) Fluctuation exponents of the commu-nication activity (a) sending and (b) receiving messages bymembers of POK (weekly resolution) The exponents are plot-ted as a function of the activity level M for the original data(green circles) and individually shuffled sequences (orange di-amonds) See also [12]

corresponding stock or with the average number of e-mails(see also [31])

Apart from these for human related data long-term persistence has been reported for physiologicalrecords [223233] written language [34] or for recordsgenerated by collective behavior such as finance and econ-omy [35ndash37] Ethernet traffic [38] Wikipedia access [39]as well as highway traffic [4041] There are also indicationsof long-term correlations in human brain activity [4243]and human motor activity [44]

A question that arises is why the fluctuation expo-nent (in Figs 2 and 3) depends on the activity levelof the members that is why the least active membersexhibit no persistence while the most active members ex-hibit strong persistence We argue that if only few mes-sages appear in the whole period of data acquisition long-term persistence cannot be reflected In these cases it isquite possible that much longer records and higher ag-gregation level such as months or years would be neededto reveal the persistence But doing so there would beother members with even less messages which then againwould probably appear with seemingly uncorrelated mes-sage signals Thus we propose that the exponents of thelargest activity reflect more accurately the scaling behav-ior of human communication activity In Section 331we propose statistical simulations to generate data usingpeaks over threshold (POT) and find that it supports thisperception

At this point we need to mention that long-term cor-relations can be related to broad inter-event time dis-tributions ie the times between successive messages ofindividual members Such distributions have been inves-tigated see eg [1345] but there is no consensus on thefunctional form We study the inter-event time distribu-tions in a different publication [14] where we demonstratethe connection with the long-term correlations found here

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

100

101

102

103

104

total number of messages ML

04

06

08

1

12

HD

FA

2 (

10 lt

= s

lt=

63)

(a) QX (b) QX

~ (Δt)075

~ Δt12

Fig 4 (Color online) Temporal correlations in the dailyamount of messages on directed links in QX (a) DFA2 fluctu-ation functions versus the time scale Δt averaged conditionalto the final number of messages of each link The differentcurves correspond to different activity levels ML = 1minus2 3ndash7 8ndash20 21ndash54 55ndash148 149ndash403 404ndash1096 1097ndash2980 (frombottom to top) The dotted lines correspond to the exponentsH = 075 (top) and H = 12 (bottom) (b) The DFA2 fluctua-tion exponent HLQX obtained from (a) is plotted as a functionof the activity level ML The exponents were obtained in therange of scales 10 le Δt le 63 days The activity along directedlinks comprise similar long-term correlations as the total ac-tivity of individual members to all of their acquaintances

Along directed links

In Figure 4 we study for QX the long-term correlationsin activity not on the sender or receiver (node) level buton the level of messages along directed links This meansthat we track when a message is sent directed between twomembers but separately for any pair of members such asararrb brarra ararrd Accordingly we determine the activ-ity records μab(t) μba(t) μad(t) etc expressing how manymessages have been sent each dayweek t between anypair of members Analogous there is also an activity levelfor the links M ab

L (we disregard those pairings with-out activity) Then we perform the analogous analysis forlong-term correlations by applying DFA2 and averagingamong pairings with similar overall activity (the distribu-tions of activity are discussed in Sect 341) The fluctua-tion functions in Figure 4a have asymptotic slopes close to12 for those links with few total number of messages Incontrary those links with many total number of messagesexhibit long-term correlations with exponents up to 074The fluctuation exponents as a function of the activitylevel ML are plotted in Figure 4b Apart from the factthat by definition the number of messages on the most ac-tive links is lower (or equal) than the number of messagesof the most active members the curve looks very similarto the one in Figure 2a in particular the maximum expo-nents are quite similar (HQX asymp 075 and HLQX asymp 074)This indicates that the persistence in the communicationmay be dominated by the communication activity withthe most liked partners

In [46] a different concept of persistence links has beeninvestigated The period of data acquisition is partitionedinto time slices in each of which a network is built Thenthe persistence is defined as the normalized number oftime slices in which a certain link appears However theapproach of our work is not compatible with the one in [46]and we cannot directly compare the results

D Rybski et al Communication activity in social networks growth and correlations 151

32 Growth process

321 Growth in the number of messages

As suggested in [12] we also analyze the growth prop-erties of the message activity This concept is borrowedfrom econophysics where the growth of companies hasbeen found to exhibit non-trivial scaling laws [1] that inparticular violate the original Gibratrsquos law [9ndash1147] andat the same time represents a generalized Gibratrsquos law(GGL) [12] In the present study each member is con-sidered as a unit and the number of messages sent or re-ceived since the beginning of data acquisition representsits size We analyze the growth in the number of mes-sages in analogy to other systems such as the growth ofcompanies [148] or the growth of cities [749] The anal-ogy is supported by some aspects (i) the members of acommunity represent a population similar to the popula-tion of a country (ii) the number of members fluctuatesand typically grows analogous to the number of cities of acountry (iii) the activity or number of links of individualsfluctuates and grows similar to the size of cities

The cumulative number mj(t) expresses how manymessages have been sent by a certain member j up to agiven time t (for a better readability we will not writethe index j explicitly m(t)) We consider the evolution ofm(t) between times t0 and t1 within the period of dataacquisition T (t0 lt t1 le T ) as a growth process whereeach member exhibits a specific growth rate rj (r for shortnotation)

r = lnm1

m0 (3)

where m0 equiv m(t0) and m1 equiv m(t1) are the number ofmessages sent until t0 and t1 respectively by every mem-ber To characterize the dynamics of the activity we con-sider two measures (i) The conditional average growthrate 〈r(m0)〉 quantifies the average growth of the num-ber of messages sent by the members between t0 and t1depending on the initial number of messages m0 In otherwords we consider the average growth rate of only thosemembers that have sent m0 messages until t0 (ii) Theconditional standard deviation of the growth rate for thosemembers that have sent m0 messages until t0

σ(m0) equivradic

〈(r(m0) minus 〈r(m0)〉)2〉 (4)

expresses the statistical spread or fluctuation of growthamong the members depending on m0 Both quanti-ties are relevant in the context of Gibratrsquos law in eco-nomics [9ndash1147] which proposes a proportionate growthprocess entailing the assumption that the average and thestandard deviation of the growth rate of a given economicindicator are constant and independent of the specific in-dicator value That is both 〈r(m0)〉 and σ(m0) are inde-pendent of m0

As shown in [12] for the message data the conditionalaverage growth rate is almost constant and only decreasesslightly

〈r(m0)〉 sim mminusα0 (5)

with an exponent α asymp 005 This means that memberswith many messages in average increase their number ofmessages almost with the same rate as members with fewmessages In contrast the conditional standard deviationclearly decreases with increasing m0

σ(m0) sim mminusβ0 (6)

where t0 = T2 is optimal in terms statistics In this casethe exponents βQX = 022 plusmn 001 and βPOK = 017 plusmn003 for sending messages were found [12] This meansalthough the average growth rate almost does not dependon m0 the conditional standard deviation of the growthof members with many messages is smaller than the one ofmembers with few messages Due to weaker fluctuationsactive members are relatively better predictable in theiractivity of sending messages

It has been shown that the fluctuation exponent H andthe growth fluctuation exponent β are related via [12]

β = 1 minus H (7)

Equation (7) is a scaling law formalizing the relation be-tween growth and long-term correlations in the activ-ity According to equation (7) the original Gibratrsquos law(βG = 0) corresponds to very strong long-term correla-tions with HG = 1 In contrast βrnd = 12 representscompletely random activity (Hrnd = 12) The observedmessage data comprises 12 gt β gt 0 and 12 lt H lt 1Surprisingly the values of β found here are very close tothe β values found for companies in the US economy [1]

In the case of companies also the distribution ofgrowth rates has been studied It was found that the dis-tribution density follows [1]

p(r|m0) =1

sσ(m0)exp

(

minuss|r minus 〈r(m0)〉|σ(m0)

)

(8)

whereas s =radic

2 Next we analyze how the growth rates rare distributed in the case of the message data

First we need to point out that in contrast to thegrowth of companies our entities can never shrink Themembers cannot loose messages the number m(t) eitherincreases or remains the same Accordingly in our caser ge 0 and therefore s = 1 as can be derived for thesingle-sided exponentially decaying distribution

Figure 5 shows p(r|m0) for QX where the values arescaled to collapse according to equation (8) with s = 1In order to have reasonable statistics we define the con-dition m0 in rather wide ranges namely according to thedecimal logarithm For sending (Fig 5a) and receiving(Fig 5b) messages the scaled probability densities col-lapse and are quite similar Nevertheless the growth ratesdo not exactly follow equation (8) with s = 1 While forthe less active members with small growth rates we finda good agreement for more active members and largegrowth rates the obtained curves deviate from the the-oretical one towards a steeper decay

The corresponding results for POK are shown in Fig-ure 6 Again sending and receiving are very similar The

152 The European Physical Journal B

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) QX messages send (b) QX messages receive

Fig 5 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof QX (a) Sending and (b) receiving The times for m0 andm1 have been chosen as t0 = T2 and t1 = T The symbolscorrespond to different initial number of messages m0 The axisare scaled assuming a distribution according to equation (8)with s = 1 which then corresponds to the dotted lines

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) POK messages send (b) POK messages receive

Fig 6 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof POK (a) Sending and (b) receiving Analogous to Figure 5

curves collapse reasonably but in contrast to QX here themeasured p(r|m0) overall deviate from the theoretical onecomprising less steep slopes

We argue that as for single time series distributionand correlation properties are in most cases independentthe same holds for the message data and the growth Thedistribution of growth rates p(r|m0) seems to be indepen-dent from the long-term correlations which are reflectedin σ(m0) with the exponent β

322 Mutual growth in the number of messages

Next we study a variation of growth Instead of consider-ing the absolute number of messages a member sends westudy the difference in the number of messages comparedto any other member the mutual difference mi(t)minusmj(t)Thus the growth rate is defined analogous to equation (3)

rtimes = lnmi

1 minus mj1

mi0 minus mj

0

(9)

where now there is a growth rate for every pair of mem-bers i and j The conditional average growth rate and thecorresponding standard deviation is then taken over allpossible pairs and the condition is the difference at t0

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

ltr x(

m0i minus

m0j )gt

σ(m

0i minusm

0j )

(a) QX messages send (b) QX messages send shuf

~ (m0

iminusm0

j)

minus03

~ (m0

iminusm0

j)

minus12

Fig 7 (Color online) Average mutual growth rate and stan-dard deviation versus foregoing difference in the number ofmessages for sending in QX The average (open squares) andstandard deviation (filled circles) of the mutual growth ratertimes equation (9) are plotted conditional to the initial differ-ence mi

0 minusmj0 whereas t0 = T2 and t1 = T (a) Original data

and (b) shuffled data The dotted line in (a) corresponds tothe exponent βtimes = 03 and in (b) to βtimes = 12

mi0 minus mj

0 = mi(t0) minus mj(t0) providing the quantities〈rtimes(mi

0 minus mj0)〉 and σ(mi

0 minus mj0) We disregard combina-

tions of i and j where mi0 minus mj

0 = 0 or mi1minusmj

1

mi0minusmj

0le 0

The results for sending in QX are shown in Figure 7Apart from a small decrease up to mi

0 minus mj0 asymp 50 the

average growth rate is constant (Fig 7a) The conditionalstandard deviation asymptotically follows a slope βtimes asymp03 with deviations to small exponents for small mi

0 minusmj0

In the case of the shuffled data (Fig 7b) as expected theaverage growth rate is constant while the standard devia-tion decreases steeper than for the original data namelywith βtimesrnd 12 although not with a nice straight lineNevertheless we conclude that the scaling of the standarddeviation in Figure 7a must be due to temporal correla-tions between the members The growth of the differencebetween their number of messages comprises similar scal-ing as the individual growth

We conjecture that σ(mi0 minus mj

0) reflects long-term cross-correlations in analogy to σ(m0) for auto-correlations However so far we are not able to providefurther evidence for this analogy and the correspondingrelation to β = 1 minus H equation (7) since an appropriatetechnique for the direct quantification of long-term cross-correlations is lacking

33 Modeling

In what follows we propose numerical simulations withthe purpose of testing the methods and empirical pat-terns we found We study three approaches adopted tothe modeling of human activity (a) peaks over thresh-olds (b) preferential attachment [50] and (c) cascadingPoisson process [27]

331 Peaks over threshold (POT) simulations

Our finding that the activity of sending messages ex-hibits long-term persistence asserts the existence of an

D Rybski et al Communication activity in social networks growth and correlations 153

0 5 10 15 20time [w]

0

2

4

6

0 50 100 150 200time

minus2

minus1

0

1

2

3(a)

(b)

(c)

(d)

q

Fig 8 (Color online) Illustration of the peaks over thresh-old simulations (a) An underlying and unknown long-termcorrelated process determines the instantaneous probabilityof sending messages Once this state passes certain thresh-old q (dashed orange line) a messages is sent (green diamonds)(b) Generated instants of messages (c) with windows for ag-gregation such as messages per day (d) Aggregated record ofmessages in windows of size w here w = 10

underlying long-term correlated process It can be under-stood as an unknown individual state driven by variousinternal and external stimuli [274351ndash54] increasing theprobability to send messages Generating such a hypothet-ical long-term correlated internal process (xi) simulatedmessage data can be defined by the instants at which thisinternal process exceeds a threshold q (peaks over thresh-old POT) see [55ndash57] and references therein

More precisely we consider a long-term correlated se-quence (xi) consisting of Nlowast random numbers that is nor-malized to zero average (〈x〉 = 0) and unit standard devi-ation (σx = 1) Choosing a threshold q at each instant ithe probability to send a messages is

psnd =

1 for xi gt q

0 for xi le q (10)

Thus the message events are given by the indices i ofthose random numbers xi exceeding q

Figure 8a illustrates the procedure The random num-bers are plotted as brown circles and the events exceedingthe threshold (orange dashed line) by the green diamondsThe resulting instants are depicted in Figure 8b represent-ing the simulated messages The threshold approximatelypredefines the total number of events and accordingly theaverage inter-event time Using normal-distributed num-bers (xi) the number of eventsmessages is approximatelygiven by the length Nlowast and the inverse cumulative dis-tribution function associated with the standard normaldistribution (probit-function) Additionally the randomnumbers we use are long-term correlated with variable

100

101

102

103

104

Δt [w]

10minus3

10minus2

10minus1

100

101

102

103

104

FD

FA

2(Δt

)

100

101

102

103

104

m0

10minus1

100

ltr(

m0)

gt

100

101

102

103

104

105

106

total number of events M

0

02

04

06

08

1

12

HD

FA

2 (

1000

lt=

s lt

= 1

0000

)q=10q=15q=40q=45

100

101

102

103

104

m0

10minus2

10minus1

100

σ(m

0)

H=09H=08H=07H=06H=05slope β=1minusH

~ (Δt)09

~ (Δt)05

(a) (b)

(c) (d)

Fig 9 (Color online) Results of numerical simulations(a) Mean growth rate conditional to the number of events un-til t0 = Nlowast2 as obtained from 100 000 long-term correlatedrecords of length Nlowast = 131 072 with variable imposed fluctua-tion exponent Himp between 12 and 09 and random thresh-old q between 10 and 60 (b) As before but standard devia-tion conditional to the number of events The solid lines repre-sent power-laws with exponents β expected from the imposedlong-term correlations according to equation (7) (c) Long-termcorrelations in the sequences of aggregated peaks over thresh-old For every threshold q between 10 (violet) and 45 (black)100 normalized records of length Nlowast = 4194 304 have beencreated with Himp = 09 The events are aggregated in win-dows of size w = 100 The panel shows the averaged DFA2fluctuation functions (d) Fluctuation exponents on the scales1 000 le s le 10 000 as a function of the total number of events

fluctuation exponent We impose these auto-correlationsusing Fourier filtering method [2358] Next we show thatthis process reproduces the scaling in the growth ieGGL as well as the variable long-term correlations in theactivity of the members (eg Figs 2 and 3)

For testing this process we create 100 000 independentlong-term correlated records (xi) of length Nlowast = 131 072impose the fluctuation exponent Himp and choose for eachone a random threshold q between 1 and 6 each represent-ing a sender Extracting the peaks over threshold we ob-tain the events and determine for each recordmember thegrowth in the number of eventsmessages between Nlowast2and Nlowast This is for each recordmember we count thenumbers of eventsmessages m0 until t0 = ti=Nlowast2 as wellas m1 until t1 = ti=Nlowast and calculate the growth rate ac-cording to equation (3) We then calculate the conditionalaverage 〈r(m0)〉 and the conditional standard deviationσ(m0) where the values of m0 are binned logarithmicallyThe quantities are plotted in Figures 9a and 9b while inpanel (b) we include slopes expected from β = 1 minus H equation (7) We find that the numerical results reason-ably agree with the prediction (solid lines) Except forsmall m0 these results are consistent with those found inthe original message data

154 The European Physical Journal B

The fluctuation functions can be studied in the sameway As described in Section 31 we find long-term corre-lations in the sequences of messages per day or per weekOn the basis of the above explained simulated messageswe analyze them in an analogous way For each thresh-old q = 10 15 40 45 we create 100 long-term cor-related records of length Nlowast = 4 194 304 with imposedfluctuation exponent Himp = 09 extract the simulatedmessage events and aggregate them in non-overlappingwindows of size w = 100 This is tiling Nlowast in segmentsof size w and counting the number of events occurring ineach segment (Figs 8c and 8d) The obtained aggregatedrecords represent the analogous of messages per day or perweek and are analyzed with DFA averaging the fluctua-tion functions among those configurations with the samethreshold and thus similar number of total events Thecorresponding results are shown in Figure 9c and 9d Weobtain very similar results as in the original data We findvanishing correlations for the sequences with few events(large q) and pronounced long-term correlations for thecases of many events (small q) while the maximum fluc-tuation exponent corresponds to the chosen Himp Thiscan be understood by the fact that for q close to zerothe sequence of number of events per window convergesto the aggregated sequence of 0 or 1 (for x le 0 or x gt 0)reflecting the same long-term correlation properties as theoriginal record [59] For a large threshold q too few eventsoccur to measure the correct long-term correlations egthe true scaling only turns out on larger unaccessible timescales requiring larger w and longer records

Although the simulations do not reveal the originof the long-term correlated patchy behavior they sup-port equation (7) and the concept of an underlying long-term correlated process Consistently an uncorrelatedcompletely random underlying process recovers Poissonstatistics and therefore βrnd = 12 for the growth fluctu-ations as well as uncorrelated message activity (Hrnd =12) For 12 lt H lt 1 it has been shown [5557] that theinter-event times follow a stretched exponential distribu-tion see also [131460]

332 Preferential attachment

Next we compare our findings with the growth proper-ties of a network model We investigate the Barabasi-Albert (BA) model which is based on preferential at-tachment and has been introduced to generate a kind ofscale-free networks [5061] with power-law degree distri-bution p(k) [6263] Essentially it consists of subsequentlyadding nodes to the network by linking them to existingnodes which are chosen randomly with a probability pro-portional to their degree

We obtain the undirected network and study the de-gree growth properties by calculating the conditional av-erage growth rate 〈rBA(k0)〉 and the conditional standarddeviation σBA(k0) obtained from the scale-free BA modelThe times t0 and t1 are defined by the number of nodesattached to the network

Figure 10 shows the results where an average degree〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1 were

101

102

103

degree k0

10minus2

10minus1

ltr B

A(k

0)gt

σB

A(k

0)

ltrBA(k0)gtσBA(k0)

~ k0

minus12

Fig 10 (Color online) Average degree growth rate and stan-dard deviation versus foregoing degree for the preferential at-tachment network model [50] The average (green open di-amonds) and standard deviation (blue filled circles) of thegrowth rate rBA are plotted conditional to k0 the degree ofthe corresponding nodes at the first stage We choose averagedegree 〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1The error-bars are taken from 10 configurations The dashedline in the bottom corresponds to βBA = 12

chosen We find constant average growth rate that doesnot depend on the initial degree k0 The conditional stan-dard deviation is a function of k0 and exhibits a power-lawdecay with βBA = 12 as expected for such an uncorre-lated growth process [12] Therefore a purely preferentialattachment type of growth is not sufficient to describe thetype of social network dynamics found in Section 321since additional temporal correlations are involved in thedynamics of establishing acquaintances in the community

The value βBA = 12 in equation (7) corresponds toH = 12 indicating complete randomness There is nomemory in the system Since each addition of a new nodeis completely independent from precedent ones there can-not be temporal correlations in the activity of addinglinks In contrast for the out-degree in QX and POK weobtained βkQX = 022plusmn002 and βkPOK = 017plusmn008 [12]which is supported by the (non-linear) correlations be-tween the number of messages and the out-degree as pre-sented in Section 34

Interestingly an extension of the standard BA modelhas been proposed [64] see also [6566] that takes intoaccount different fitnesses of the nodes to acquiring linksWe think that such fitness could be related to growth fluc-tuations thus providing a route to modify the BA modelto include the long-term correlated dynamics found here

333 Cascading Poisson process

In this section we elaborate the model proposed in [27]and examine it with respect to long-term correlations Themodel is based on a cascading Poisson process (CPP)according to which the probability that a member entersan active interval is ρ(t) = Nwpd(t)pw(t) where Nw is theaverage number of active intervals per week pd(t) is the

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 4: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

150 The European Physical Journal B

100

101

102

103

104

num of total messages sent M

04

06

08

1

12

HD

FA

2 (

10lt

=slt

=70

)

send

realshuffled individual sequences of messages per day

101

102

103

104

num of total messages received M

receive

(a) POK (b) POK

Fig 3 (Color online) Fluctuation exponents of the commu-nication activity (a) sending and (b) receiving messages bymembers of POK (weekly resolution) The exponents are plot-ted as a function of the activity level M for the original data(green circles) and individually shuffled sequences (orange di-amonds) See also [12]

corresponding stock or with the average number of e-mails(see also [31])

Apart from these for human related data long-term persistence has been reported for physiologicalrecords [223233] written language [34] or for recordsgenerated by collective behavior such as finance and econ-omy [35ndash37] Ethernet traffic [38] Wikipedia access [39]as well as highway traffic [4041] There are also indicationsof long-term correlations in human brain activity [4243]and human motor activity [44]

A question that arises is why the fluctuation expo-nent (in Figs 2 and 3) depends on the activity levelof the members that is why the least active membersexhibit no persistence while the most active members ex-hibit strong persistence We argue that if only few mes-sages appear in the whole period of data acquisition long-term persistence cannot be reflected In these cases it isquite possible that much longer records and higher ag-gregation level such as months or years would be neededto reveal the persistence But doing so there would beother members with even less messages which then againwould probably appear with seemingly uncorrelated mes-sage signals Thus we propose that the exponents of thelargest activity reflect more accurately the scaling behav-ior of human communication activity In Section 331we propose statistical simulations to generate data usingpeaks over threshold (POT) and find that it supports thisperception

At this point we need to mention that long-term cor-relations can be related to broad inter-event time dis-tributions ie the times between successive messages ofindividual members Such distributions have been inves-tigated see eg [1345] but there is no consensus on thefunctional form We study the inter-event time distribu-tions in a different publication [14] where we demonstratethe connection with the long-term correlations found here

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

100

101

102

103

104

total number of messages ML

04

06

08

1

12

HD

FA

2 (

10 lt

= s

lt=

63)

(a) QX (b) QX

~ (Δt)075

~ Δt12

Fig 4 (Color online) Temporal correlations in the dailyamount of messages on directed links in QX (a) DFA2 fluctu-ation functions versus the time scale Δt averaged conditionalto the final number of messages of each link The differentcurves correspond to different activity levels ML = 1minus2 3ndash7 8ndash20 21ndash54 55ndash148 149ndash403 404ndash1096 1097ndash2980 (frombottom to top) The dotted lines correspond to the exponentsH = 075 (top) and H = 12 (bottom) (b) The DFA2 fluctua-tion exponent HLQX obtained from (a) is plotted as a functionof the activity level ML The exponents were obtained in therange of scales 10 le Δt le 63 days The activity along directedlinks comprise similar long-term correlations as the total ac-tivity of individual members to all of their acquaintances

Along directed links

In Figure 4 we study for QX the long-term correlationsin activity not on the sender or receiver (node) level buton the level of messages along directed links This meansthat we track when a message is sent directed between twomembers but separately for any pair of members such asararrb brarra ararrd Accordingly we determine the activ-ity records μab(t) μba(t) μad(t) etc expressing how manymessages have been sent each dayweek t between anypair of members Analogous there is also an activity levelfor the links M ab

L (we disregard those pairings with-out activity) Then we perform the analogous analysis forlong-term correlations by applying DFA2 and averagingamong pairings with similar overall activity (the distribu-tions of activity are discussed in Sect 341) The fluctua-tion functions in Figure 4a have asymptotic slopes close to12 for those links with few total number of messages Incontrary those links with many total number of messagesexhibit long-term correlations with exponents up to 074The fluctuation exponents as a function of the activitylevel ML are plotted in Figure 4b Apart from the factthat by definition the number of messages on the most ac-tive links is lower (or equal) than the number of messagesof the most active members the curve looks very similarto the one in Figure 2a in particular the maximum expo-nents are quite similar (HQX asymp 075 and HLQX asymp 074)This indicates that the persistence in the communicationmay be dominated by the communication activity withthe most liked partners

In [46] a different concept of persistence links has beeninvestigated The period of data acquisition is partitionedinto time slices in each of which a network is built Thenthe persistence is defined as the normalized number oftime slices in which a certain link appears However theapproach of our work is not compatible with the one in [46]and we cannot directly compare the results

D Rybski et al Communication activity in social networks growth and correlations 151

32 Growth process

321 Growth in the number of messages

As suggested in [12] we also analyze the growth prop-erties of the message activity This concept is borrowedfrom econophysics where the growth of companies hasbeen found to exhibit non-trivial scaling laws [1] that inparticular violate the original Gibratrsquos law [9ndash1147] andat the same time represents a generalized Gibratrsquos law(GGL) [12] In the present study each member is con-sidered as a unit and the number of messages sent or re-ceived since the beginning of data acquisition representsits size We analyze the growth in the number of mes-sages in analogy to other systems such as the growth ofcompanies [148] or the growth of cities [749] The anal-ogy is supported by some aspects (i) the members of acommunity represent a population similar to the popula-tion of a country (ii) the number of members fluctuatesand typically grows analogous to the number of cities of acountry (iii) the activity or number of links of individualsfluctuates and grows similar to the size of cities

The cumulative number mj(t) expresses how manymessages have been sent by a certain member j up to agiven time t (for a better readability we will not writethe index j explicitly m(t)) We consider the evolution ofm(t) between times t0 and t1 within the period of dataacquisition T (t0 lt t1 le T ) as a growth process whereeach member exhibits a specific growth rate rj (r for shortnotation)

r = lnm1

m0 (3)

where m0 equiv m(t0) and m1 equiv m(t1) are the number ofmessages sent until t0 and t1 respectively by every mem-ber To characterize the dynamics of the activity we con-sider two measures (i) The conditional average growthrate 〈r(m0)〉 quantifies the average growth of the num-ber of messages sent by the members between t0 and t1depending on the initial number of messages m0 In otherwords we consider the average growth rate of only thosemembers that have sent m0 messages until t0 (ii) Theconditional standard deviation of the growth rate for thosemembers that have sent m0 messages until t0

σ(m0) equivradic

〈(r(m0) minus 〈r(m0)〉)2〉 (4)

expresses the statistical spread or fluctuation of growthamong the members depending on m0 Both quanti-ties are relevant in the context of Gibratrsquos law in eco-nomics [9ndash1147] which proposes a proportionate growthprocess entailing the assumption that the average and thestandard deviation of the growth rate of a given economicindicator are constant and independent of the specific in-dicator value That is both 〈r(m0)〉 and σ(m0) are inde-pendent of m0

As shown in [12] for the message data the conditionalaverage growth rate is almost constant and only decreasesslightly

〈r(m0)〉 sim mminusα0 (5)

with an exponent α asymp 005 This means that memberswith many messages in average increase their number ofmessages almost with the same rate as members with fewmessages In contrast the conditional standard deviationclearly decreases with increasing m0

σ(m0) sim mminusβ0 (6)

where t0 = T2 is optimal in terms statistics In this casethe exponents βQX = 022 plusmn 001 and βPOK = 017 plusmn003 for sending messages were found [12] This meansalthough the average growth rate almost does not dependon m0 the conditional standard deviation of the growthof members with many messages is smaller than the one ofmembers with few messages Due to weaker fluctuationsactive members are relatively better predictable in theiractivity of sending messages

It has been shown that the fluctuation exponent H andthe growth fluctuation exponent β are related via [12]

β = 1 minus H (7)

Equation (7) is a scaling law formalizing the relation be-tween growth and long-term correlations in the activ-ity According to equation (7) the original Gibratrsquos law(βG = 0) corresponds to very strong long-term correla-tions with HG = 1 In contrast βrnd = 12 representscompletely random activity (Hrnd = 12) The observedmessage data comprises 12 gt β gt 0 and 12 lt H lt 1Surprisingly the values of β found here are very close tothe β values found for companies in the US economy [1]

In the case of companies also the distribution ofgrowth rates has been studied It was found that the dis-tribution density follows [1]

p(r|m0) =1

sσ(m0)exp

(

minuss|r minus 〈r(m0)〉|σ(m0)

)

(8)

whereas s =radic

2 Next we analyze how the growth rates rare distributed in the case of the message data

First we need to point out that in contrast to thegrowth of companies our entities can never shrink Themembers cannot loose messages the number m(t) eitherincreases or remains the same Accordingly in our caser ge 0 and therefore s = 1 as can be derived for thesingle-sided exponentially decaying distribution

Figure 5 shows p(r|m0) for QX where the values arescaled to collapse according to equation (8) with s = 1In order to have reasonable statistics we define the con-dition m0 in rather wide ranges namely according to thedecimal logarithm For sending (Fig 5a) and receiving(Fig 5b) messages the scaled probability densities col-lapse and are quite similar Nevertheless the growth ratesdo not exactly follow equation (8) with s = 1 While forthe less active members with small growth rates we finda good agreement for more active members and largegrowth rates the obtained curves deviate from the the-oretical one towards a steeper decay

The corresponding results for POK are shown in Fig-ure 6 Again sending and receiving are very similar The

152 The European Physical Journal B

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) QX messages send (b) QX messages receive

Fig 5 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof QX (a) Sending and (b) receiving The times for m0 andm1 have been chosen as t0 = T2 and t1 = T The symbolscorrespond to different initial number of messages m0 The axisare scaled assuming a distribution according to equation (8)with s = 1 which then corresponds to the dotted lines

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) POK messages send (b) POK messages receive

Fig 6 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof POK (a) Sending and (b) receiving Analogous to Figure 5

curves collapse reasonably but in contrast to QX here themeasured p(r|m0) overall deviate from the theoretical onecomprising less steep slopes

We argue that as for single time series distributionand correlation properties are in most cases independentthe same holds for the message data and the growth Thedistribution of growth rates p(r|m0) seems to be indepen-dent from the long-term correlations which are reflectedin σ(m0) with the exponent β

322 Mutual growth in the number of messages

Next we study a variation of growth Instead of consider-ing the absolute number of messages a member sends westudy the difference in the number of messages comparedto any other member the mutual difference mi(t)minusmj(t)Thus the growth rate is defined analogous to equation (3)

rtimes = lnmi

1 minus mj1

mi0 minus mj

0

(9)

where now there is a growth rate for every pair of mem-bers i and j The conditional average growth rate and thecorresponding standard deviation is then taken over allpossible pairs and the condition is the difference at t0

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

ltr x(

m0i minus

m0j )gt

σ(m

0i minusm

0j )

(a) QX messages send (b) QX messages send shuf

~ (m0

iminusm0

j)

minus03

~ (m0

iminusm0

j)

minus12

Fig 7 (Color online) Average mutual growth rate and stan-dard deviation versus foregoing difference in the number ofmessages for sending in QX The average (open squares) andstandard deviation (filled circles) of the mutual growth ratertimes equation (9) are plotted conditional to the initial differ-ence mi

0 minusmj0 whereas t0 = T2 and t1 = T (a) Original data

and (b) shuffled data The dotted line in (a) corresponds tothe exponent βtimes = 03 and in (b) to βtimes = 12

mi0 minus mj

0 = mi(t0) minus mj(t0) providing the quantities〈rtimes(mi

0 minus mj0)〉 and σ(mi

0 minus mj0) We disregard combina-

tions of i and j where mi0 minus mj

0 = 0 or mi1minusmj

1

mi0minusmj

0le 0

The results for sending in QX are shown in Figure 7Apart from a small decrease up to mi

0 minus mj0 asymp 50 the

average growth rate is constant (Fig 7a) The conditionalstandard deviation asymptotically follows a slope βtimes asymp03 with deviations to small exponents for small mi

0 minusmj0

In the case of the shuffled data (Fig 7b) as expected theaverage growth rate is constant while the standard devia-tion decreases steeper than for the original data namelywith βtimesrnd 12 although not with a nice straight lineNevertheless we conclude that the scaling of the standarddeviation in Figure 7a must be due to temporal correla-tions between the members The growth of the differencebetween their number of messages comprises similar scal-ing as the individual growth

We conjecture that σ(mi0 minus mj

0) reflects long-term cross-correlations in analogy to σ(m0) for auto-correlations However so far we are not able to providefurther evidence for this analogy and the correspondingrelation to β = 1 minus H equation (7) since an appropriatetechnique for the direct quantification of long-term cross-correlations is lacking

33 Modeling

In what follows we propose numerical simulations withthe purpose of testing the methods and empirical pat-terns we found We study three approaches adopted tothe modeling of human activity (a) peaks over thresh-olds (b) preferential attachment [50] and (c) cascadingPoisson process [27]

331 Peaks over threshold (POT) simulations

Our finding that the activity of sending messages ex-hibits long-term persistence asserts the existence of an

D Rybski et al Communication activity in social networks growth and correlations 153

0 5 10 15 20time [w]

0

2

4

6

0 50 100 150 200time

minus2

minus1

0

1

2

3(a)

(b)

(c)

(d)

q

Fig 8 (Color online) Illustration of the peaks over thresh-old simulations (a) An underlying and unknown long-termcorrelated process determines the instantaneous probabilityof sending messages Once this state passes certain thresh-old q (dashed orange line) a messages is sent (green diamonds)(b) Generated instants of messages (c) with windows for ag-gregation such as messages per day (d) Aggregated record ofmessages in windows of size w here w = 10

underlying long-term correlated process It can be under-stood as an unknown individual state driven by variousinternal and external stimuli [274351ndash54] increasing theprobability to send messages Generating such a hypothet-ical long-term correlated internal process (xi) simulatedmessage data can be defined by the instants at which thisinternal process exceeds a threshold q (peaks over thresh-old POT) see [55ndash57] and references therein

More precisely we consider a long-term correlated se-quence (xi) consisting of Nlowast random numbers that is nor-malized to zero average (〈x〉 = 0) and unit standard devi-ation (σx = 1) Choosing a threshold q at each instant ithe probability to send a messages is

psnd =

1 for xi gt q

0 for xi le q (10)

Thus the message events are given by the indices i ofthose random numbers xi exceeding q

Figure 8a illustrates the procedure The random num-bers are plotted as brown circles and the events exceedingthe threshold (orange dashed line) by the green diamondsThe resulting instants are depicted in Figure 8b represent-ing the simulated messages The threshold approximatelypredefines the total number of events and accordingly theaverage inter-event time Using normal-distributed num-bers (xi) the number of eventsmessages is approximatelygiven by the length Nlowast and the inverse cumulative dis-tribution function associated with the standard normaldistribution (probit-function) Additionally the randomnumbers we use are long-term correlated with variable

100

101

102

103

104

Δt [w]

10minus3

10minus2

10minus1

100

101

102

103

104

FD

FA

2(Δt

)

100

101

102

103

104

m0

10minus1

100

ltr(

m0)

gt

100

101

102

103

104

105

106

total number of events M

0

02

04

06

08

1

12

HD

FA

2 (

1000

lt=

s lt

= 1

0000

)q=10q=15q=40q=45

100

101

102

103

104

m0

10minus2

10minus1

100

σ(m

0)

H=09H=08H=07H=06H=05slope β=1minusH

~ (Δt)09

~ (Δt)05

(a) (b)

(c) (d)

Fig 9 (Color online) Results of numerical simulations(a) Mean growth rate conditional to the number of events un-til t0 = Nlowast2 as obtained from 100 000 long-term correlatedrecords of length Nlowast = 131 072 with variable imposed fluctua-tion exponent Himp between 12 and 09 and random thresh-old q between 10 and 60 (b) As before but standard devia-tion conditional to the number of events The solid lines repre-sent power-laws with exponents β expected from the imposedlong-term correlations according to equation (7) (c) Long-termcorrelations in the sequences of aggregated peaks over thresh-old For every threshold q between 10 (violet) and 45 (black)100 normalized records of length Nlowast = 4194 304 have beencreated with Himp = 09 The events are aggregated in win-dows of size w = 100 The panel shows the averaged DFA2fluctuation functions (d) Fluctuation exponents on the scales1 000 le s le 10 000 as a function of the total number of events

fluctuation exponent We impose these auto-correlationsusing Fourier filtering method [2358] Next we show thatthis process reproduces the scaling in the growth ieGGL as well as the variable long-term correlations in theactivity of the members (eg Figs 2 and 3)

For testing this process we create 100 000 independentlong-term correlated records (xi) of length Nlowast = 131 072impose the fluctuation exponent Himp and choose for eachone a random threshold q between 1 and 6 each represent-ing a sender Extracting the peaks over threshold we ob-tain the events and determine for each recordmember thegrowth in the number of eventsmessages between Nlowast2and Nlowast This is for each recordmember we count thenumbers of eventsmessages m0 until t0 = ti=Nlowast2 as wellas m1 until t1 = ti=Nlowast and calculate the growth rate ac-cording to equation (3) We then calculate the conditionalaverage 〈r(m0)〉 and the conditional standard deviationσ(m0) where the values of m0 are binned logarithmicallyThe quantities are plotted in Figures 9a and 9b while inpanel (b) we include slopes expected from β = 1 minus H equation (7) We find that the numerical results reason-ably agree with the prediction (solid lines) Except forsmall m0 these results are consistent with those found inthe original message data

154 The European Physical Journal B

The fluctuation functions can be studied in the sameway As described in Section 31 we find long-term corre-lations in the sequences of messages per day or per weekOn the basis of the above explained simulated messageswe analyze them in an analogous way For each thresh-old q = 10 15 40 45 we create 100 long-term cor-related records of length Nlowast = 4 194 304 with imposedfluctuation exponent Himp = 09 extract the simulatedmessage events and aggregate them in non-overlappingwindows of size w = 100 This is tiling Nlowast in segmentsof size w and counting the number of events occurring ineach segment (Figs 8c and 8d) The obtained aggregatedrecords represent the analogous of messages per day or perweek and are analyzed with DFA averaging the fluctua-tion functions among those configurations with the samethreshold and thus similar number of total events Thecorresponding results are shown in Figure 9c and 9d Weobtain very similar results as in the original data We findvanishing correlations for the sequences with few events(large q) and pronounced long-term correlations for thecases of many events (small q) while the maximum fluc-tuation exponent corresponds to the chosen Himp Thiscan be understood by the fact that for q close to zerothe sequence of number of events per window convergesto the aggregated sequence of 0 or 1 (for x le 0 or x gt 0)reflecting the same long-term correlation properties as theoriginal record [59] For a large threshold q too few eventsoccur to measure the correct long-term correlations egthe true scaling only turns out on larger unaccessible timescales requiring larger w and longer records

Although the simulations do not reveal the originof the long-term correlated patchy behavior they sup-port equation (7) and the concept of an underlying long-term correlated process Consistently an uncorrelatedcompletely random underlying process recovers Poissonstatistics and therefore βrnd = 12 for the growth fluctu-ations as well as uncorrelated message activity (Hrnd =12) For 12 lt H lt 1 it has been shown [5557] that theinter-event times follow a stretched exponential distribu-tion see also [131460]

332 Preferential attachment

Next we compare our findings with the growth proper-ties of a network model We investigate the Barabasi-Albert (BA) model which is based on preferential at-tachment and has been introduced to generate a kind ofscale-free networks [5061] with power-law degree distri-bution p(k) [6263] Essentially it consists of subsequentlyadding nodes to the network by linking them to existingnodes which are chosen randomly with a probability pro-portional to their degree

We obtain the undirected network and study the de-gree growth properties by calculating the conditional av-erage growth rate 〈rBA(k0)〉 and the conditional standarddeviation σBA(k0) obtained from the scale-free BA modelThe times t0 and t1 are defined by the number of nodesattached to the network

Figure 10 shows the results where an average degree〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1 were

101

102

103

degree k0

10minus2

10minus1

ltr B

A(k

0)gt

σB

A(k

0)

ltrBA(k0)gtσBA(k0)

~ k0

minus12

Fig 10 (Color online) Average degree growth rate and stan-dard deviation versus foregoing degree for the preferential at-tachment network model [50] The average (green open di-amonds) and standard deviation (blue filled circles) of thegrowth rate rBA are plotted conditional to k0 the degree ofthe corresponding nodes at the first stage We choose averagedegree 〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1The error-bars are taken from 10 configurations The dashedline in the bottom corresponds to βBA = 12

chosen We find constant average growth rate that doesnot depend on the initial degree k0 The conditional stan-dard deviation is a function of k0 and exhibits a power-lawdecay with βBA = 12 as expected for such an uncorre-lated growth process [12] Therefore a purely preferentialattachment type of growth is not sufficient to describe thetype of social network dynamics found in Section 321since additional temporal correlations are involved in thedynamics of establishing acquaintances in the community

The value βBA = 12 in equation (7) corresponds toH = 12 indicating complete randomness There is nomemory in the system Since each addition of a new nodeis completely independent from precedent ones there can-not be temporal correlations in the activity of addinglinks In contrast for the out-degree in QX and POK weobtained βkQX = 022plusmn002 and βkPOK = 017plusmn008 [12]which is supported by the (non-linear) correlations be-tween the number of messages and the out-degree as pre-sented in Section 34

Interestingly an extension of the standard BA modelhas been proposed [64] see also [6566] that takes intoaccount different fitnesses of the nodes to acquiring linksWe think that such fitness could be related to growth fluc-tuations thus providing a route to modify the BA modelto include the long-term correlated dynamics found here

333 Cascading Poisson process

In this section we elaborate the model proposed in [27]and examine it with respect to long-term correlations Themodel is based on a cascading Poisson process (CPP)according to which the probability that a member entersan active interval is ρ(t) = Nwpd(t)pw(t) where Nw is theaverage number of active intervals per week pd(t) is the

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 5: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

D Rybski et al Communication activity in social networks growth and correlations 151

32 Growth process

321 Growth in the number of messages

As suggested in [12] we also analyze the growth prop-erties of the message activity This concept is borrowedfrom econophysics where the growth of companies hasbeen found to exhibit non-trivial scaling laws [1] that inparticular violate the original Gibratrsquos law [9ndash1147] andat the same time represents a generalized Gibratrsquos law(GGL) [12] In the present study each member is con-sidered as a unit and the number of messages sent or re-ceived since the beginning of data acquisition representsits size We analyze the growth in the number of mes-sages in analogy to other systems such as the growth ofcompanies [148] or the growth of cities [749] The anal-ogy is supported by some aspects (i) the members of acommunity represent a population similar to the popula-tion of a country (ii) the number of members fluctuatesand typically grows analogous to the number of cities of acountry (iii) the activity or number of links of individualsfluctuates and grows similar to the size of cities

The cumulative number mj(t) expresses how manymessages have been sent by a certain member j up to agiven time t (for a better readability we will not writethe index j explicitly m(t)) We consider the evolution ofm(t) between times t0 and t1 within the period of dataacquisition T (t0 lt t1 le T ) as a growth process whereeach member exhibits a specific growth rate rj (r for shortnotation)

r = lnm1

m0 (3)

where m0 equiv m(t0) and m1 equiv m(t1) are the number ofmessages sent until t0 and t1 respectively by every mem-ber To characterize the dynamics of the activity we con-sider two measures (i) The conditional average growthrate 〈r(m0)〉 quantifies the average growth of the num-ber of messages sent by the members between t0 and t1depending on the initial number of messages m0 In otherwords we consider the average growth rate of only thosemembers that have sent m0 messages until t0 (ii) Theconditional standard deviation of the growth rate for thosemembers that have sent m0 messages until t0

σ(m0) equivradic

〈(r(m0) minus 〈r(m0)〉)2〉 (4)

expresses the statistical spread or fluctuation of growthamong the members depending on m0 Both quanti-ties are relevant in the context of Gibratrsquos law in eco-nomics [9ndash1147] which proposes a proportionate growthprocess entailing the assumption that the average and thestandard deviation of the growth rate of a given economicindicator are constant and independent of the specific in-dicator value That is both 〈r(m0)〉 and σ(m0) are inde-pendent of m0

As shown in [12] for the message data the conditionalaverage growth rate is almost constant and only decreasesslightly

〈r(m0)〉 sim mminusα0 (5)

with an exponent α asymp 005 This means that memberswith many messages in average increase their number ofmessages almost with the same rate as members with fewmessages In contrast the conditional standard deviationclearly decreases with increasing m0

σ(m0) sim mminusβ0 (6)

where t0 = T2 is optimal in terms statistics In this casethe exponents βQX = 022 plusmn 001 and βPOK = 017 plusmn003 for sending messages were found [12] This meansalthough the average growth rate almost does not dependon m0 the conditional standard deviation of the growthof members with many messages is smaller than the one ofmembers with few messages Due to weaker fluctuationsactive members are relatively better predictable in theiractivity of sending messages

It has been shown that the fluctuation exponent H andthe growth fluctuation exponent β are related via [12]

β = 1 minus H (7)

Equation (7) is a scaling law formalizing the relation be-tween growth and long-term correlations in the activ-ity According to equation (7) the original Gibratrsquos law(βG = 0) corresponds to very strong long-term correla-tions with HG = 1 In contrast βrnd = 12 representscompletely random activity (Hrnd = 12) The observedmessage data comprises 12 gt β gt 0 and 12 lt H lt 1Surprisingly the values of β found here are very close tothe β values found for companies in the US economy [1]

In the case of companies also the distribution ofgrowth rates has been studied It was found that the dis-tribution density follows [1]

p(r|m0) =1

sσ(m0)exp

(

minuss|r minus 〈r(m0)〉|σ(m0)

)

(8)

whereas s =radic

2 Next we analyze how the growth rates rare distributed in the case of the message data

First we need to point out that in contrast to thegrowth of companies our entities can never shrink Themembers cannot loose messages the number m(t) eitherincreases or remains the same Accordingly in our caser ge 0 and therefore s = 1 as can be derived for thesingle-sided exponentially decaying distribution

Figure 5 shows p(r|m0) for QX where the values arescaled to collapse according to equation (8) with s = 1In order to have reasonable statistics we define the con-dition m0 in rather wide ranges namely according to thedecimal logarithm For sending (Fig 5a) and receiving(Fig 5b) messages the scaled probability densities col-lapse and are quite similar Nevertheless the growth ratesdo not exactly follow equation (8) with s = 1 While forthe less active members with small growth rates we finda good agreement for more active members and largegrowth rates the obtained curves deviate from the the-oretical one towards a steeper decay

The corresponding results for POK are shown in Fig-ure 6 Again sending and receiving are very similar The

152 The European Physical Journal B

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) QX messages send (b) QX messages receive

Fig 5 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof QX (a) Sending and (b) receiving The times for m0 andm1 have been chosen as t0 = T2 and t1 = T The symbolscorrespond to different initial number of messages m0 The axisare scaled assuming a distribution according to equation (8)with s = 1 which then corresponds to the dotted lines

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) POK messages send (b) POK messages receive

Fig 6 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof POK (a) Sending and (b) receiving Analogous to Figure 5

curves collapse reasonably but in contrast to QX here themeasured p(r|m0) overall deviate from the theoretical onecomprising less steep slopes

We argue that as for single time series distributionand correlation properties are in most cases independentthe same holds for the message data and the growth Thedistribution of growth rates p(r|m0) seems to be indepen-dent from the long-term correlations which are reflectedin σ(m0) with the exponent β

322 Mutual growth in the number of messages

Next we study a variation of growth Instead of consider-ing the absolute number of messages a member sends westudy the difference in the number of messages comparedto any other member the mutual difference mi(t)minusmj(t)Thus the growth rate is defined analogous to equation (3)

rtimes = lnmi

1 minus mj1

mi0 minus mj

0

(9)

where now there is a growth rate for every pair of mem-bers i and j The conditional average growth rate and thecorresponding standard deviation is then taken over allpossible pairs and the condition is the difference at t0

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

ltr x(

m0i minus

m0j )gt

σ(m

0i minusm

0j )

(a) QX messages send (b) QX messages send shuf

~ (m0

iminusm0

j)

minus03

~ (m0

iminusm0

j)

minus12

Fig 7 (Color online) Average mutual growth rate and stan-dard deviation versus foregoing difference in the number ofmessages for sending in QX The average (open squares) andstandard deviation (filled circles) of the mutual growth ratertimes equation (9) are plotted conditional to the initial differ-ence mi

0 minusmj0 whereas t0 = T2 and t1 = T (a) Original data

and (b) shuffled data The dotted line in (a) corresponds tothe exponent βtimes = 03 and in (b) to βtimes = 12

mi0 minus mj

0 = mi(t0) minus mj(t0) providing the quantities〈rtimes(mi

0 minus mj0)〉 and σ(mi

0 minus mj0) We disregard combina-

tions of i and j where mi0 minus mj

0 = 0 or mi1minusmj

1

mi0minusmj

0le 0

The results for sending in QX are shown in Figure 7Apart from a small decrease up to mi

0 minus mj0 asymp 50 the

average growth rate is constant (Fig 7a) The conditionalstandard deviation asymptotically follows a slope βtimes asymp03 with deviations to small exponents for small mi

0 minusmj0

In the case of the shuffled data (Fig 7b) as expected theaverage growth rate is constant while the standard devia-tion decreases steeper than for the original data namelywith βtimesrnd 12 although not with a nice straight lineNevertheless we conclude that the scaling of the standarddeviation in Figure 7a must be due to temporal correla-tions between the members The growth of the differencebetween their number of messages comprises similar scal-ing as the individual growth

We conjecture that σ(mi0 minus mj

0) reflects long-term cross-correlations in analogy to σ(m0) for auto-correlations However so far we are not able to providefurther evidence for this analogy and the correspondingrelation to β = 1 minus H equation (7) since an appropriatetechnique for the direct quantification of long-term cross-correlations is lacking

33 Modeling

In what follows we propose numerical simulations withthe purpose of testing the methods and empirical pat-terns we found We study three approaches adopted tothe modeling of human activity (a) peaks over thresh-olds (b) preferential attachment [50] and (c) cascadingPoisson process [27]

331 Peaks over threshold (POT) simulations

Our finding that the activity of sending messages ex-hibits long-term persistence asserts the existence of an

D Rybski et al Communication activity in social networks growth and correlations 153

0 5 10 15 20time [w]

0

2

4

6

0 50 100 150 200time

minus2

minus1

0

1

2

3(a)

(b)

(c)

(d)

q

Fig 8 (Color online) Illustration of the peaks over thresh-old simulations (a) An underlying and unknown long-termcorrelated process determines the instantaneous probabilityof sending messages Once this state passes certain thresh-old q (dashed orange line) a messages is sent (green diamonds)(b) Generated instants of messages (c) with windows for ag-gregation such as messages per day (d) Aggregated record ofmessages in windows of size w here w = 10

underlying long-term correlated process It can be under-stood as an unknown individual state driven by variousinternal and external stimuli [274351ndash54] increasing theprobability to send messages Generating such a hypothet-ical long-term correlated internal process (xi) simulatedmessage data can be defined by the instants at which thisinternal process exceeds a threshold q (peaks over thresh-old POT) see [55ndash57] and references therein

More precisely we consider a long-term correlated se-quence (xi) consisting of Nlowast random numbers that is nor-malized to zero average (〈x〉 = 0) and unit standard devi-ation (σx = 1) Choosing a threshold q at each instant ithe probability to send a messages is

psnd =

1 for xi gt q

0 for xi le q (10)

Thus the message events are given by the indices i ofthose random numbers xi exceeding q

Figure 8a illustrates the procedure The random num-bers are plotted as brown circles and the events exceedingthe threshold (orange dashed line) by the green diamondsThe resulting instants are depicted in Figure 8b represent-ing the simulated messages The threshold approximatelypredefines the total number of events and accordingly theaverage inter-event time Using normal-distributed num-bers (xi) the number of eventsmessages is approximatelygiven by the length Nlowast and the inverse cumulative dis-tribution function associated with the standard normaldistribution (probit-function) Additionally the randomnumbers we use are long-term correlated with variable

100

101

102

103

104

Δt [w]

10minus3

10minus2

10minus1

100

101

102

103

104

FD

FA

2(Δt

)

100

101

102

103

104

m0

10minus1

100

ltr(

m0)

gt

100

101

102

103

104

105

106

total number of events M

0

02

04

06

08

1

12

HD

FA

2 (

1000

lt=

s lt

= 1

0000

)q=10q=15q=40q=45

100

101

102

103

104

m0

10minus2

10minus1

100

σ(m

0)

H=09H=08H=07H=06H=05slope β=1minusH

~ (Δt)09

~ (Δt)05

(a) (b)

(c) (d)

Fig 9 (Color online) Results of numerical simulations(a) Mean growth rate conditional to the number of events un-til t0 = Nlowast2 as obtained from 100 000 long-term correlatedrecords of length Nlowast = 131 072 with variable imposed fluctua-tion exponent Himp between 12 and 09 and random thresh-old q between 10 and 60 (b) As before but standard devia-tion conditional to the number of events The solid lines repre-sent power-laws with exponents β expected from the imposedlong-term correlations according to equation (7) (c) Long-termcorrelations in the sequences of aggregated peaks over thresh-old For every threshold q between 10 (violet) and 45 (black)100 normalized records of length Nlowast = 4194 304 have beencreated with Himp = 09 The events are aggregated in win-dows of size w = 100 The panel shows the averaged DFA2fluctuation functions (d) Fluctuation exponents on the scales1 000 le s le 10 000 as a function of the total number of events

fluctuation exponent We impose these auto-correlationsusing Fourier filtering method [2358] Next we show thatthis process reproduces the scaling in the growth ieGGL as well as the variable long-term correlations in theactivity of the members (eg Figs 2 and 3)

For testing this process we create 100 000 independentlong-term correlated records (xi) of length Nlowast = 131 072impose the fluctuation exponent Himp and choose for eachone a random threshold q between 1 and 6 each represent-ing a sender Extracting the peaks over threshold we ob-tain the events and determine for each recordmember thegrowth in the number of eventsmessages between Nlowast2and Nlowast This is for each recordmember we count thenumbers of eventsmessages m0 until t0 = ti=Nlowast2 as wellas m1 until t1 = ti=Nlowast and calculate the growth rate ac-cording to equation (3) We then calculate the conditionalaverage 〈r(m0)〉 and the conditional standard deviationσ(m0) where the values of m0 are binned logarithmicallyThe quantities are plotted in Figures 9a and 9b while inpanel (b) we include slopes expected from β = 1 minus H equation (7) We find that the numerical results reason-ably agree with the prediction (solid lines) Except forsmall m0 these results are consistent with those found inthe original message data

154 The European Physical Journal B

The fluctuation functions can be studied in the sameway As described in Section 31 we find long-term corre-lations in the sequences of messages per day or per weekOn the basis of the above explained simulated messageswe analyze them in an analogous way For each thresh-old q = 10 15 40 45 we create 100 long-term cor-related records of length Nlowast = 4 194 304 with imposedfluctuation exponent Himp = 09 extract the simulatedmessage events and aggregate them in non-overlappingwindows of size w = 100 This is tiling Nlowast in segmentsof size w and counting the number of events occurring ineach segment (Figs 8c and 8d) The obtained aggregatedrecords represent the analogous of messages per day or perweek and are analyzed with DFA averaging the fluctua-tion functions among those configurations with the samethreshold and thus similar number of total events Thecorresponding results are shown in Figure 9c and 9d Weobtain very similar results as in the original data We findvanishing correlations for the sequences with few events(large q) and pronounced long-term correlations for thecases of many events (small q) while the maximum fluc-tuation exponent corresponds to the chosen Himp Thiscan be understood by the fact that for q close to zerothe sequence of number of events per window convergesto the aggregated sequence of 0 or 1 (for x le 0 or x gt 0)reflecting the same long-term correlation properties as theoriginal record [59] For a large threshold q too few eventsoccur to measure the correct long-term correlations egthe true scaling only turns out on larger unaccessible timescales requiring larger w and longer records

Although the simulations do not reveal the originof the long-term correlated patchy behavior they sup-port equation (7) and the concept of an underlying long-term correlated process Consistently an uncorrelatedcompletely random underlying process recovers Poissonstatistics and therefore βrnd = 12 for the growth fluctu-ations as well as uncorrelated message activity (Hrnd =12) For 12 lt H lt 1 it has been shown [5557] that theinter-event times follow a stretched exponential distribu-tion see also [131460]

332 Preferential attachment

Next we compare our findings with the growth proper-ties of a network model We investigate the Barabasi-Albert (BA) model which is based on preferential at-tachment and has been introduced to generate a kind ofscale-free networks [5061] with power-law degree distri-bution p(k) [6263] Essentially it consists of subsequentlyadding nodes to the network by linking them to existingnodes which are chosen randomly with a probability pro-portional to their degree

We obtain the undirected network and study the de-gree growth properties by calculating the conditional av-erage growth rate 〈rBA(k0)〉 and the conditional standarddeviation σBA(k0) obtained from the scale-free BA modelThe times t0 and t1 are defined by the number of nodesattached to the network

Figure 10 shows the results where an average degree〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1 were

101

102

103

degree k0

10minus2

10minus1

ltr B

A(k

0)gt

σB

A(k

0)

ltrBA(k0)gtσBA(k0)

~ k0

minus12

Fig 10 (Color online) Average degree growth rate and stan-dard deviation versus foregoing degree for the preferential at-tachment network model [50] The average (green open di-amonds) and standard deviation (blue filled circles) of thegrowth rate rBA are plotted conditional to k0 the degree ofthe corresponding nodes at the first stage We choose averagedegree 〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1The error-bars are taken from 10 configurations The dashedline in the bottom corresponds to βBA = 12

chosen We find constant average growth rate that doesnot depend on the initial degree k0 The conditional stan-dard deviation is a function of k0 and exhibits a power-lawdecay with βBA = 12 as expected for such an uncorre-lated growth process [12] Therefore a purely preferentialattachment type of growth is not sufficient to describe thetype of social network dynamics found in Section 321since additional temporal correlations are involved in thedynamics of establishing acquaintances in the community

The value βBA = 12 in equation (7) corresponds toH = 12 indicating complete randomness There is nomemory in the system Since each addition of a new nodeis completely independent from precedent ones there can-not be temporal correlations in the activity of addinglinks In contrast for the out-degree in QX and POK weobtained βkQX = 022plusmn002 and βkPOK = 017plusmn008 [12]which is supported by the (non-linear) correlations be-tween the number of messages and the out-degree as pre-sented in Section 34

Interestingly an extension of the standard BA modelhas been proposed [64] see also [6566] that takes intoaccount different fitnesses of the nodes to acquiring linksWe think that such fitness could be related to growth fluc-tuations thus providing a route to modify the BA modelto include the long-term correlated dynamics found here

333 Cascading Poisson process

In this section we elaborate the model proposed in [27]and examine it with respect to long-term correlations Themodel is based on a cascading Poisson process (CPP)according to which the probability that a member entersan active interval is ρ(t) = Nwpd(t)pw(t) where Nw is theaverage number of active intervals per week pd(t) is the

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 6: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

152 The European Physical Journal B

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) QX messages send (b) QX messages receive

Fig 5 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof QX (a) Sending and (b) receiving The times for m0 andm1 have been chosen as t0 = T2 and t1 = T The symbolscorrespond to different initial number of messages m0 The axisare scaled assuming a distribution according to equation (8)with s = 1 which then corresponds to the dotted lines

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

10minus5

10minus4

10minus3

10minus2

10minus1

100

101

p(r|

m0)

σ(m

0)

1minus910minus99100minus9991000minus9999

minus2 0 2 4 6(rminusltr(m0)gt)σ(m0)

~ eminusr

(a) POK messages send (b) POK messages receive

Fig 6 (Color online) Scaled probability density of growthrates r equation (3) in the number of messages by membersof POK (a) Sending and (b) receiving Analogous to Figure 5

curves collapse reasonably but in contrast to QX here themeasured p(r|m0) overall deviate from the theoretical onecomprising less steep slopes

We argue that as for single time series distributionand correlation properties are in most cases independentthe same holds for the message data and the growth Thedistribution of growth rates p(r|m0) seems to be indepen-dent from the long-term correlations which are reflectedin σ(m0) with the exponent β

322 Mutual growth in the number of messages

Next we study a variation of growth Instead of consider-ing the absolute number of messages a member sends westudy the difference in the number of messages comparedto any other member the mutual difference mi(t)minusmj(t)Thus the growth rate is defined analogous to equation (3)

rtimes = lnmi

1 minus mj1

mi0 minus mj

0

(9)

where now there is a growth rate for every pair of mem-bers i and j The conditional average growth rate and thecorresponding standard deviation is then taken over allpossible pairs and the condition is the difference at t0

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

100

101

102

103

104

m0

iminusm0

j

10minus2

10minus1

100

101

ltr x(

m0i minus

m0j )gt

σ(m

0i minusm

0j )

(a) QX messages send (b) QX messages send shuf

~ (m0

iminusm0

j)

minus03

~ (m0

iminusm0

j)

minus12

Fig 7 (Color online) Average mutual growth rate and stan-dard deviation versus foregoing difference in the number ofmessages for sending in QX The average (open squares) andstandard deviation (filled circles) of the mutual growth ratertimes equation (9) are plotted conditional to the initial differ-ence mi

0 minusmj0 whereas t0 = T2 and t1 = T (a) Original data

and (b) shuffled data The dotted line in (a) corresponds tothe exponent βtimes = 03 and in (b) to βtimes = 12

mi0 minus mj

0 = mi(t0) minus mj(t0) providing the quantities〈rtimes(mi

0 minus mj0)〉 and σ(mi

0 minus mj0) We disregard combina-

tions of i and j where mi0 minus mj

0 = 0 or mi1minusmj

1

mi0minusmj

0le 0

The results for sending in QX are shown in Figure 7Apart from a small decrease up to mi

0 minus mj0 asymp 50 the

average growth rate is constant (Fig 7a) The conditionalstandard deviation asymptotically follows a slope βtimes asymp03 with deviations to small exponents for small mi

0 minusmj0

In the case of the shuffled data (Fig 7b) as expected theaverage growth rate is constant while the standard devia-tion decreases steeper than for the original data namelywith βtimesrnd 12 although not with a nice straight lineNevertheless we conclude that the scaling of the standarddeviation in Figure 7a must be due to temporal correla-tions between the members The growth of the differencebetween their number of messages comprises similar scal-ing as the individual growth

We conjecture that σ(mi0 minus mj

0) reflects long-term cross-correlations in analogy to σ(m0) for auto-correlations However so far we are not able to providefurther evidence for this analogy and the correspondingrelation to β = 1 minus H equation (7) since an appropriatetechnique for the direct quantification of long-term cross-correlations is lacking

33 Modeling

In what follows we propose numerical simulations withthe purpose of testing the methods and empirical pat-terns we found We study three approaches adopted tothe modeling of human activity (a) peaks over thresh-olds (b) preferential attachment [50] and (c) cascadingPoisson process [27]

331 Peaks over threshold (POT) simulations

Our finding that the activity of sending messages ex-hibits long-term persistence asserts the existence of an

D Rybski et al Communication activity in social networks growth and correlations 153

0 5 10 15 20time [w]

0

2

4

6

0 50 100 150 200time

minus2

minus1

0

1

2

3(a)

(b)

(c)

(d)

q

Fig 8 (Color online) Illustration of the peaks over thresh-old simulations (a) An underlying and unknown long-termcorrelated process determines the instantaneous probabilityof sending messages Once this state passes certain thresh-old q (dashed orange line) a messages is sent (green diamonds)(b) Generated instants of messages (c) with windows for ag-gregation such as messages per day (d) Aggregated record ofmessages in windows of size w here w = 10

underlying long-term correlated process It can be under-stood as an unknown individual state driven by variousinternal and external stimuli [274351ndash54] increasing theprobability to send messages Generating such a hypothet-ical long-term correlated internal process (xi) simulatedmessage data can be defined by the instants at which thisinternal process exceeds a threshold q (peaks over thresh-old POT) see [55ndash57] and references therein

More precisely we consider a long-term correlated se-quence (xi) consisting of Nlowast random numbers that is nor-malized to zero average (〈x〉 = 0) and unit standard devi-ation (σx = 1) Choosing a threshold q at each instant ithe probability to send a messages is

psnd =

1 for xi gt q

0 for xi le q (10)

Thus the message events are given by the indices i ofthose random numbers xi exceeding q

Figure 8a illustrates the procedure The random num-bers are plotted as brown circles and the events exceedingthe threshold (orange dashed line) by the green diamondsThe resulting instants are depicted in Figure 8b represent-ing the simulated messages The threshold approximatelypredefines the total number of events and accordingly theaverage inter-event time Using normal-distributed num-bers (xi) the number of eventsmessages is approximatelygiven by the length Nlowast and the inverse cumulative dis-tribution function associated with the standard normaldistribution (probit-function) Additionally the randomnumbers we use are long-term correlated with variable

100

101

102

103

104

Δt [w]

10minus3

10minus2

10minus1

100

101

102

103

104

FD

FA

2(Δt

)

100

101

102

103

104

m0

10minus1

100

ltr(

m0)

gt

100

101

102

103

104

105

106

total number of events M

0

02

04

06

08

1

12

HD

FA

2 (

1000

lt=

s lt

= 1

0000

)q=10q=15q=40q=45

100

101

102

103

104

m0

10minus2

10minus1

100

σ(m

0)

H=09H=08H=07H=06H=05slope β=1minusH

~ (Δt)09

~ (Δt)05

(a) (b)

(c) (d)

Fig 9 (Color online) Results of numerical simulations(a) Mean growth rate conditional to the number of events un-til t0 = Nlowast2 as obtained from 100 000 long-term correlatedrecords of length Nlowast = 131 072 with variable imposed fluctua-tion exponent Himp between 12 and 09 and random thresh-old q between 10 and 60 (b) As before but standard devia-tion conditional to the number of events The solid lines repre-sent power-laws with exponents β expected from the imposedlong-term correlations according to equation (7) (c) Long-termcorrelations in the sequences of aggregated peaks over thresh-old For every threshold q between 10 (violet) and 45 (black)100 normalized records of length Nlowast = 4194 304 have beencreated with Himp = 09 The events are aggregated in win-dows of size w = 100 The panel shows the averaged DFA2fluctuation functions (d) Fluctuation exponents on the scales1 000 le s le 10 000 as a function of the total number of events

fluctuation exponent We impose these auto-correlationsusing Fourier filtering method [2358] Next we show thatthis process reproduces the scaling in the growth ieGGL as well as the variable long-term correlations in theactivity of the members (eg Figs 2 and 3)

For testing this process we create 100 000 independentlong-term correlated records (xi) of length Nlowast = 131 072impose the fluctuation exponent Himp and choose for eachone a random threshold q between 1 and 6 each represent-ing a sender Extracting the peaks over threshold we ob-tain the events and determine for each recordmember thegrowth in the number of eventsmessages between Nlowast2and Nlowast This is for each recordmember we count thenumbers of eventsmessages m0 until t0 = ti=Nlowast2 as wellas m1 until t1 = ti=Nlowast and calculate the growth rate ac-cording to equation (3) We then calculate the conditionalaverage 〈r(m0)〉 and the conditional standard deviationσ(m0) where the values of m0 are binned logarithmicallyThe quantities are plotted in Figures 9a and 9b while inpanel (b) we include slopes expected from β = 1 minus H equation (7) We find that the numerical results reason-ably agree with the prediction (solid lines) Except forsmall m0 these results are consistent with those found inthe original message data

154 The European Physical Journal B

The fluctuation functions can be studied in the sameway As described in Section 31 we find long-term corre-lations in the sequences of messages per day or per weekOn the basis of the above explained simulated messageswe analyze them in an analogous way For each thresh-old q = 10 15 40 45 we create 100 long-term cor-related records of length Nlowast = 4 194 304 with imposedfluctuation exponent Himp = 09 extract the simulatedmessage events and aggregate them in non-overlappingwindows of size w = 100 This is tiling Nlowast in segmentsof size w and counting the number of events occurring ineach segment (Figs 8c and 8d) The obtained aggregatedrecords represent the analogous of messages per day or perweek and are analyzed with DFA averaging the fluctua-tion functions among those configurations with the samethreshold and thus similar number of total events Thecorresponding results are shown in Figure 9c and 9d Weobtain very similar results as in the original data We findvanishing correlations for the sequences with few events(large q) and pronounced long-term correlations for thecases of many events (small q) while the maximum fluc-tuation exponent corresponds to the chosen Himp Thiscan be understood by the fact that for q close to zerothe sequence of number of events per window convergesto the aggregated sequence of 0 or 1 (for x le 0 or x gt 0)reflecting the same long-term correlation properties as theoriginal record [59] For a large threshold q too few eventsoccur to measure the correct long-term correlations egthe true scaling only turns out on larger unaccessible timescales requiring larger w and longer records

Although the simulations do not reveal the originof the long-term correlated patchy behavior they sup-port equation (7) and the concept of an underlying long-term correlated process Consistently an uncorrelatedcompletely random underlying process recovers Poissonstatistics and therefore βrnd = 12 for the growth fluctu-ations as well as uncorrelated message activity (Hrnd =12) For 12 lt H lt 1 it has been shown [5557] that theinter-event times follow a stretched exponential distribu-tion see also [131460]

332 Preferential attachment

Next we compare our findings with the growth proper-ties of a network model We investigate the Barabasi-Albert (BA) model which is based on preferential at-tachment and has been introduced to generate a kind ofscale-free networks [5061] with power-law degree distri-bution p(k) [6263] Essentially it consists of subsequentlyadding nodes to the network by linking them to existingnodes which are chosen randomly with a probability pro-portional to their degree

We obtain the undirected network and study the de-gree growth properties by calculating the conditional av-erage growth rate 〈rBA(k0)〉 and the conditional standarddeviation σBA(k0) obtained from the scale-free BA modelThe times t0 and t1 are defined by the number of nodesattached to the network

Figure 10 shows the results where an average degree〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1 were

101

102

103

degree k0

10minus2

10minus1

ltr B

A(k

0)gt

σB

A(k

0)

ltrBA(k0)gtσBA(k0)

~ k0

minus12

Fig 10 (Color online) Average degree growth rate and stan-dard deviation versus foregoing degree for the preferential at-tachment network model [50] The average (green open di-amonds) and standard deviation (blue filled circles) of thegrowth rate rBA are plotted conditional to k0 the degree ofthe corresponding nodes at the first stage We choose averagedegree 〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1The error-bars are taken from 10 configurations The dashedline in the bottom corresponds to βBA = 12

chosen We find constant average growth rate that doesnot depend on the initial degree k0 The conditional stan-dard deviation is a function of k0 and exhibits a power-lawdecay with βBA = 12 as expected for such an uncorre-lated growth process [12] Therefore a purely preferentialattachment type of growth is not sufficient to describe thetype of social network dynamics found in Section 321since additional temporal correlations are involved in thedynamics of establishing acquaintances in the community

The value βBA = 12 in equation (7) corresponds toH = 12 indicating complete randomness There is nomemory in the system Since each addition of a new nodeis completely independent from precedent ones there can-not be temporal correlations in the activity of addinglinks In contrast for the out-degree in QX and POK weobtained βkQX = 022plusmn002 and βkPOK = 017plusmn008 [12]which is supported by the (non-linear) correlations be-tween the number of messages and the out-degree as pre-sented in Section 34

Interestingly an extension of the standard BA modelhas been proposed [64] see also [6566] that takes intoaccount different fitnesses of the nodes to acquiring linksWe think that such fitness could be related to growth fluc-tuations thus providing a route to modify the BA modelto include the long-term correlated dynamics found here

333 Cascading Poisson process

In this section we elaborate the model proposed in [27]and examine it with respect to long-term correlations Themodel is based on a cascading Poisson process (CPP)according to which the probability that a member entersan active interval is ρ(t) = Nwpd(t)pw(t) where Nw is theaverage number of active intervals per week pd(t) is the

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 7: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

D Rybski et al Communication activity in social networks growth and correlations 153

0 5 10 15 20time [w]

0

2

4

6

0 50 100 150 200time

minus2

minus1

0

1

2

3(a)

(b)

(c)

(d)

q

Fig 8 (Color online) Illustration of the peaks over thresh-old simulations (a) An underlying and unknown long-termcorrelated process determines the instantaneous probabilityof sending messages Once this state passes certain thresh-old q (dashed orange line) a messages is sent (green diamonds)(b) Generated instants of messages (c) with windows for ag-gregation such as messages per day (d) Aggregated record ofmessages in windows of size w here w = 10

underlying long-term correlated process It can be under-stood as an unknown individual state driven by variousinternal and external stimuli [274351ndash54] increasing theprobability to send messages Generating such a hypothet-ical long-term correlated internal process (xi) simulatedmessage data can be defined by the instants at which thisinternal process exceeds a threshold q (peaks over thresh-old POT) see [55ndash57] and references therein

More precisely we consider a long-term correlated se-quence (xi) consisting of Nlowast random numbers that is nor-malized to zero average (〈x〉 = 0) and unit standard devi-ation (σx = 1) Choosing a threshold q at each instant ithe probability to send a messages is

psnd =

1 for xi gt q

0 for xi le q (10)

Thus the message events are given by the indices i ofthose random numbers xi exceeding q

Figure 8a illustrates the procedure The random num-bers are plotted as brown circles and the events exceedingthe threshold (orange dashed line) by the green diamondsThe resulting instants are depicted in Figure 8b represent-ing the simulated messages The threshold approximatelypredefines the total number of events and accordingly theaverage inter-event time Using normal-distributed num-bers (xi) the number of eventsmessages is approximatelygiven by the length Nlowast and the inverse cumulative dis-tribution function associated with the standard normaldistribution (probit-function) Additionally the randomnumbers we use are long-term correlated with variable

100

101

102

103

104

Δt [w]

10minus3

10minus2

10minus1

100

101

102

103

104

FD

FA

2(Δt

)

100

101

102

103

104

m0

10minus1

100

ltr(

m0)

gt

100

101

102

103

104

105

106

total number of events M

0

02

04

06

08

1

12

HD

FA

2 (

1000

lt=

s lt

= 1

0000

)q=10q=15q=40q=45

100

101

102

103

104

m0

10minus2

10minus1

100

σ(m

0)

H=09H=08H=07H=06H=05slope β=1minusH

~ (Δt)09

~ (Δt)05

(a) (b)

(c) (d)

Fig 9 (Color online) Results of numerical simulations(a) Mean growth rate conditional to the number of events un-til t0 = Nlowast2 as obtained from 100 000 long-term correlatedrecords of length Nlowast = 131 072 with variable imposed fluctua-tion exponent Himp between 12 and 09 and random thresh-old q between 10 and 60 (b) As before but standard devia-tion conditional to the number of events The solid lines repre-sent power-laws with exponents β expected from the imposedlong-term correlations according to equation (7) (c) Long-termcorrelations in the sequences of aggregated peaks over thresh-old For every threshold q between 10 (violet) and 45 (black)100 normalized records of length Nlowast = 4194 304 have beencreated with Himp = 09 The events are aggregated in win-dows of size w = 100 The panel shows the averaged DFA2fluctuation functions (d) Fluctuation exponents on the scales1 000 le s le 10 000 as a function of the total number of events

fluctuation exponent We impose these auto-correlationsusing Fourier filtering method [2358] Next we show thatthis process reproduces the scaling in the growth ieGGL as well as the variable long-term correlations in theactivity of the members (eg Figs 2 and 3)

For testing this process we create 100 000 independentlong-term correlated records (xi) of length Nlowast = 131 072impose the fluctuation exponent Himp and choose for eachone a random threshold q between 1 and 6 each represent-ing a sender Extracting the peaks over threshold we ob-tain the events and determine for each recordmember thegrowth in the number of eventsmessages between Nlowast2and Nlowast This is for each recordmember we count thenumbers of eventsmessages m0 until t0 = ti=Nlowast2 as wellas m1 until t1 = ti=Nlowast and calculate the growth rate ac-cording to equation (3) We then calculate the conditionalaverage 〈r(m0)〉 and the conditional standard deviationσ(m0) where the values of m0 are binned logarithmicallyThe quantities are plotted in Figures 9a and 9b while inpanel (b) we include slopes expected from β = 1 minus H equation (7) We find that the numerical results reason-ably agree with the prediction (solid lines) Except forsmall m0 these results are consistent with those found inthe original message data

154 The European Physical Journal B

The fluctuation functions can be studied in the sameway As described in Section 31 we find long-term corre-lations in the sequences of messages per day or per weekOn the basis of the above explained simulated messageswe analyze them in an analogous way For each thresh-old q = 10 15 40 45 we create 100 long-term cor-related records of length Nlowast = 4 194 304 with imposedfluctuation exponent Himp = 09 extract the simulatedmessage events and aggregate them in non-overlappingwindows of size w = 100 This is tiling Nlowast in segmentsof size w and counting the number of events occurring ineach segment (Figs 8c and 8d) The obtained aggregatedrecords represent the analogous of messages per day or perweek and are analyzed with DFA averaging the fluctua-tion functions among those configurations with the samethreshold and thus similar number of total events Thecorresponding results are shown in Figure 9c and 9d Weobtain very similar results as in the original data We findvanishing correlations for the sequences with few events(large q) and pronounced long-term correlations for thecases of many events (small q) while the maximum fluc-tuation exponent corresponds to the chosen Himp Thiscan be understood by the fact that for q close to zerothe sequence of number of events per window convergesto the aggregated sequence of 0 or 1 (for x le 0 or x gt 0)reflecting the same long-term correlation properties as theoriginal record [59] For a large threshold q too few eventsoccur to measure the correct long-term correlations egthe true scaling only turns out on larger unaccessible timescales requiring larger w and longer records

Although the simulations do not reveal the originof the long-term correlated patchy behavior they sup-port equation (7) and the concept of an underlying long-term correlated process Consistently an uncorrelatedcompletely random underlying process recovers Poissonstatistics and therefore βrnd = 12 for the growth fluctu-ations as well as uncorrelated message activity (Hrnd =12) For 12 lt H lt 1 it has been shown [5557] that theinter-event times follow a stretched exponential distribu-tion see also [131460]

332 Preferential attachment

Next we compare our findings with the growth proper-ties of a network model We investigate the Barabasi-Albert (BA) model which is based on preferential at-tachment and has been introduced to generate a kind ofscale-free networks [5061] with power-law degree distri-bution p(k) [6263] Essentially it consists of subsequentlyadding nodes to the network by linking them to existingnodes which are chosen randomly with a probability pro-portional to their degree

We obtain the undirected network and study the de-gree growth properties by calculating the conditional av-erage growth rate 〈rBA(k0)〉 and the conditional standarddeviation σBA(k0) obtained from the scale-free BA modelThe times t0 and t1 are defined by the number of nodesattached to the network

Figure 10 shows the results where an average degree〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1 were

101

102

103

degree k0

10minus2

10minus1

ltr B

A(k

0)gt

σB

A(k

0)

ltrBA(k0)gtσBA(k0)

~ k0

minus12

Fig 10 (Color online) Average degree growth rate and stan-dard deviation versus foregoing degree for the preferential at-tachment network model [50] The average (green open di-amonds) and standard deviation (blue filled circles) of thegrowth rate rBA are plotted conditional to k0 the degree ofthe corresponding nodes at the first stage We choose averagedegree 〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1The error-bars are taken from 10 configurations The dashedline in the bottom corresponds to βBA = 12

chosen We find constant average growth rate that doesnot depend on the initial degree k0 The conditional stan-dard deviation is a function of k0 and exhibits a power-lawdecay with βBA = 12 as expected for such an uncorre-lated growth process [12] Therefore a purely preferentialattachment type of growth is not sufficient to describe thetype of social network dynamics found in Section 321since additional temporal correlations are involved in thedynamics of establishing acquaintances in the community

The value βBA = 12 in equation (7) corresponds toH = 12 indicating complete randomness There is nomemory in the system Since each addition of a new nodeis completely independent from precedent ones there can-not be temporal correlations in the activity of addinglinks In contrast for the out-degree in QX and POK weobtained βkQX = 022plusmn002 and βkPOK = 017plusmn008 [12]which is supported by the (non-linear) correlations be-tween the number of messages and the out-degree as pre-sented in Section 34

Interestingly an extension of the standard BA modelhas been proposed [64] see also [6566] that takes intoaccount different fitnesses of the nodes to acquiring linksWe think that such fitness could be related to growth fluc-tuations thus providing a route to modify the BA modelto include the long-term correlated dynamics found here

333 Cascading Poisson process

In this section we elaborate the model proposed in [27]and examine it with respect to long-term correlations Themodel is based on a cascading Poisson process (CPP)according to which the probability that a member entersan active interval is ρ(t) = Nwpd(t)pw(t) where Nw is theaverage number of active intervals per week pd(t) is the

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 8: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

154 The European Physical Journal B

The fluctuation functions can be studied in the sameway As described in Section 31 we find long-term corre-lations in the sequences of messages per day or per weekOn the basis of the above explained simulated messageswe analyze them in an analogous way For each thresh-old q = 10 15 40 45 we create 100 long-term cor-related records of length Nlowast = 4 194 304 with imposedfluctuation exponent Himp = 09 extract the simulatedmessage events and aggregate them in non-overlappingwindows of size w = 100 This is tiling Nlowast in segmentsof size w and counting the number of events occurring ineach segment (Figs 8c and 8d) The obtained aggregatedrecords represent the analogous of messages per day or perweek and are analyzed with DFA averaging the fluctua-tion functions among those configurations with the samethreshold and thus similar number of total events Thecorresponding results are shown in Figure 9c and 9d Weobtain very similar results as in the original data We findvanishing correlations for the sequences with few events(large q) and pronounced long-term correlations for thecases of many events (small q) while the maximum fluc-tuation exponent corresponds to the chosen Himp Thiscan be understood by the fact that for q close to zerothe sequence of number of events per window convergesto the aggregated sequence of 0 or 1 (for x le 0 or x gt 0)reflecting the same long-term correlation properties as theoriginal record [59] For a large threshold q too few eventsoccur to measure the correct long-term correlations egthe true scaling only turns out on larger unaccessible timescales requiring larger w and longer records

Although the simulations do not reveal the originof the long-term correlated patchy behavior they sup-port equation (7) and the concept of an underlying long-term correlated process Consistently an uncorrelatedcompletely random underlying process recovers Poissonstatistics and therefore βrnd = 12 for the growth fluctu-ations as well as uncorrelated message activity (Hrnd =12) For 12 lt H lt 1 it has been shown [5557] that theinter-event times follow a stretched exponential distribu-tion see also [131460]

332 Preferential attachment

Next we compare our findings with the growth proper-ties of a network model We investigate the Barabasi-Albert (BA) model which is based on preferential at-tachment and has been introduced to generate a kind ofscale-free networks [5061] with power-law degree distri-bution p(k) [6263] Essentially it consists of subsequentlyadding nodes to the network by linking them to existingnodes which are chosen randomly with a probability pro-portional to their degree

We obtain the undirected network and study the de-gree growth properties by calculating the conditional av-erage growth rate 〈rBA(k0)〉 and the conditional standarddeviation σBA(k0) obtained from the scale-free BA modelThe times t0 and t1 are defined by the number of nodesattached to the network

Figure 10 shows the results where an average degree〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1 were

101

102

103

degree k0

10minus2

10minus1

ltr B

A(k

0)gt

σB

A(k

0)

ltrBA(k0)gtσBA(k0)

~ k0

minus12

Fig 10 (Color online) Average degree growth rate and stan-dard deviation versus foregoing degree for the preferential at-tachment network model [50] The average (green open di-amonds) and standard deviation (blue filled circles) of thegrowth rate rBA are plotted conditional to k0 the degree ofthe corresponding nodes at the first stage We choose averagedegree 〈k〉 = 20 50 000 nodes in t0 and 100 000 nodes in t1The error-bars are taken from 10 configurations The dashedline in the bottom corresponds to βBA = 12

chosen We find constant average growth rate that doesnot depend on the initial degree k0 The conditional stan-dard deviation is a function of k0 and exhibits a power-lawdecay with βBA = 12 as expected for such an uncorre-lated growth process [12] Therefore a purely preferentialattachment type of growth is not sufficient to describe thetype of social network dynamics found in Section 321since additional temporal correlations are involved in thedynamics of establishing acquaintances in the community

The value βBA = 12 in equation (7) corresponds toH = 12 indicating complete randomness There is nomemory in the system Since each addition of a new nodeis completely independent from precedent ones there can-not be temporal correlations in the activity of addinglinks In contrast for the out-degree in QX and POK weobtained βkQX = 022plusmn002 and βkPOK = 017plusmn008 [12]which is supported by the (non-linear) correlations be-tween the number of messages and the out-degree as pre-sented in Section 34

Interestingly an extension of the standard BA modelhas been proposed [64] see also [6566] that takes intoaccount different fitnesses of the nodes to acquiring linksWe think that such fitness could be related to growth fluc-tuations thus providing a route to modify the BA modelto include the long-term correlated dynamics found here

333 Cascading Poisson process

In this section we elaborate the model proposed in [27]and examine it with respect to long-term correlations Themodel is based on a cascading Poisson process (CPP)according to which the probability that a member entersan active interval is ρ(t) = Nwpd(t)pw(t) where Nw is theaverage number of active intervals per week pd(t) is the

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 9: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

D Rybski et al Communication activity in social networks growth and correlations 155

100

101

102

103

104

105

Δt [days]

10minus1

100

101

102

FD

FA(Δ

t) [

arb

uni

ts]

DFA1DFA2~ (Δt)

12

100

101

102

Δt [days]

10minus1

100

101

FD

FA

2(Δt

)

M=1097minus2980M=404minus1096M=149minus403M=55minus148M=21minus54M=8minus20M=3minus7M=1minus2~ (Δt)

12

(a) User 2881Malmgren et al PNAS 2008

(b)

Fig 11 (Color online) DFA fluctuation functions of message data created with the model proposed in [27] (a) User 2881 Wevisually extract the model parameters from Figure 3 in [27] and generate message data for approx 800k days The panel showsthe fluctuation functions from DFA1 and DFA2 which asymptotically go as sim (Δt)12 ie no long-term correlations The humpon small scales is due to the model inherent oscillations [23] The dashed vertical line is placed at Δt = 83 days (b) Randomparameterization We randomly choose the model parameters create 20k records of 83 days and average the obtained DFA2fluctuation functions according to the final number of messages M While for those simulated members with few messages wefind F (Δt) sim (Δt)12 the F (Δt) of the most active members exhibit a hump due to oscillations similar to the one in panel (a)

probability of starting an active interval at a particulartime of the day and pw(t) is the probability of startingan active interval at a particular day of the week Once amember enters such an active interval heshe sends a set ofNa +1 messages where Na is drawn from the distributionp(Na) The messages sent in such an active interval aresent randomly ie a homogeneous Poisson process withrate ρa events per hour

First we study the example of User 2881 as analyzedin [27] (please note that in [27] a different data set is stud-ied and the user is neither in OC1 nor in OC2) We ex-tract from [27] the parameters Nw = 73 active intervalsper week ρa = 17 events per hour as well as (visually)the distributions pd(t) pw(t) and p(Na) The original pe-riod of 83 days is not sufficient to apply DFA and werun the model for this set of parameters over 800k daysThen we extract the record of number of messages per dayμ(t) and apply DFA The obtained fluctuation functionsare shown in Figure 11a On small scales below 100 daysa hump in the F (Δt) can be identified which is due tooscillations in μ(t) [23] While asymptotically the influ-ence of oscillations vanishes on scales below the wave-length the oscillations appear as correlations (increasedslope in F (Δt)) and on scales above the wavelength theoscillations appear as anti-correlations (decreased slope inF (Δt)) Asymptotically we find F (Δt) sim (Δt)12 ieHCPP 12 corresponding to a lack of long-term cor-relations Even on scales up to 83 days we rather findHCPP lt 12 (which we expect from the imposed weeklyoscillations)

Next we study 20 000 simulated e-mail senders withrandomly chosen parameters (i) We fill pw(t) with ran-dom numbers and set pw(t) = 0 for t = 0 6 ie Sundayand Saturday (ii) We fill pd(t) with random numbers andset pd(t) = 0 for t = 0 5 and t = 23 ie at night(iii) We set p(Na) starting with a random p(Na = 0) Thenp(Na) decays exponentially up to a random Na below 36pw(t) pw(t) = 0 and p(Na) are normalized (iv) We ran-

domly choose 0 lt Nw lt 40 (v) We randomly choose0 lt ρa lt 30 From [27] Supporting Information 2 (SI2)we estimated the typical maximum values of Na Nw andρa (36 40 and 30 respectively) We run the model for83 days and extract the μ(t) for each simulated e-mailsender Then we apply DFA2 and average the fluctuationfunctions according to the final number of messages M The fluctuation functions for the various activity levels aredepicted in Figure 11b We find that members with smallfinal number of messages exhibit uncorrelated behaviorThe more active the members the more pronounced be-come the oscillations which we already discussed in thecontext of Figure 11a Thus asymptotic HCPP lt 12 forlarge M is due to the weekly cycles [23] In our data os-cillations do not dominate the DFA fluctuation functionsMoreover for OC2 we also find long-term correlations inweekly resolution (Fig 1)

Based on periodic probabilities and Poisson statisticsthe CPP model represents a powerful concept to charac-terize inter-event times For this purpose the average Nw

seems to be sufficient However in order to recover long-term correlations time dependent Nw = Nw(t) seem tobe necessary In fact the number of active intervals perweek Nw fluctuates as can be seen in [27] SI2 (uppermost row of the panels) Thus we suggest to extend themodel by introducing a memory kernel see eg [54] or byusing long-term correlated Nw(t)

34 Other correlations

In this section we want to discuss other types of correla-tions Figure 12 shows for QX the final degree K = k(T )versus the final number of messages M = m(T ) We findthat for both sending and receiving the two quantitiesare correlated according to

K sim Mλ with λ asymp 34 (11)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 10: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

156 The European Physical Journal B

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) QX (b) QX

Fig 12 (Color online) Correlations between the finaldegree K and the final number of messages M for QX (a) Out-degree and sending messages (b) in-degree and receiving mes-sages The dashed lines correspond to a power-law with expo-nent 075 Members sending many messages also tend to havehigh out-degree but not linearly rather following a power-law

100

101

102

103

104

105

sent

100

101

102

103

104

105

rece

ived

messages M

100

101

102

103

104

105

outminusdegree

100

101

102

103

104

105

inminus

degr

ee

degree K

(a) QX (b) QX

Fig 13 (Color online) Correlations between activity and pas-sivity for QX (a) Final number of messages M received andsent (b) final in- and out-degree The dashed lines correspondto a linear relation Those members who send many messagesalso receive many But those members who know many peopleto whom they send messages do not necessarily know as manypeople from whom they receive messages

Similar relations have also been found for other data [67]Since the correlations are positive those members thatsend many messages in average also have more acquain-tances to whom they send but they know less acquain-tances than they would in the case of linear correlationsFor receiving Figure 12b this correlation is very similar

The number of messages sent versus the number ofmessages received (for QX) is displayed in Figure 13aAsymptotically the activity and passivity are linearly re-lated and on average for every message sent there is areceived one or vice versa This of course does not meanthat every message is replied However the less activemembers in average tend to receive more messages thanthey send For example those members who send in av-erage one message receive about three Nevertheless themore active the members are the more the sending andreceiving behavior approaches the linear relation In con-trast for the degree Figure 13b the asymptotic linearrelation does not hold Those members with large out-degree and small in-degree are referred to as spammerssince they send to many different people but only receivefrom few

100

101

102

103

104

105

number of messages M

10minus1

100

101

102

103

104

degr

ee K

send

100

101

102

103

104

105

number of messages M

receive

~ M075

~ M075

(a) POK (b) POK

Fig 14 (Color online) Correlations between the final de-gree K and the final number of messages M for POK (a) Out-degree and sending messages (b) in-degree and receiving mes-sages Analogous to Figure 12

100

101

102

103

104

sent

100

101

102

103

104

rece

ived

messages M

100

101

102

103

104

outminusdegree

100

101

102

103

104

inminus

degr

ee

degree K

(a) POK (b) POK

Fig 15 (Color online) Correlations between activity and pas-sivity for POK (a) Final number of messages M received andsent (b) final in- and out-degree Analogous to Figure 13

For POK we find similar results in Figures 14 and 15The final degree and the final number of messages alsoscale with an exponent close to 075 although for send-ing messages there exist some deviations of the most ac-tive members (Fig 14a) Also the correlations of sendingand receiving are linear the same holds for in- and out-degree However the most active members again deviatewith low receiving part ie both low number of receivedmessages as well as low in-degree Figure 15 Neverthe-less the results for both data sets are mainly consistentand the power-law relation equation (11) is a remarkableregularity

341 Activity and degree distributions

Finally we want to briefly discuss the distributions of ac-tivities and degrees If we assume p(M) sim MminusγM andp(K) sim KminusγK then with equation (11) the exponentsshould be related according to

γK = 1 + (γM minus 1)λ (12)

Figures 16a and 16b displays the probability densitiesp(M) and p(K) for both online communities Althoughthe distributions are rather broad they do not exhibitstraight lines in double logarithmic representation Inpanel (b) we include some guides to the eye with slopesaccording to equation (12) and which roughly follow the

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 11: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

D Rybski et al Communication activity in social networks growth and correlations 157

Table 1 Overview of the obtained exponents QX and POK are the two data-sets For the BA model and the CP processsee [50] and [27] respectively HL is the fluctuation exponent along directed links βk is the growth fluctuation exponent whenthe degree is considered and βtimes is the mutual growth fluctuation exponent based on the growth between pairs

Sending H HL β βk βtimesSect or Ref [12] 31 333 31 [12] [12] 322

QX 075 plusmn 005 asymp074 022 plusmn 001 022 plusmn 002 asymp03

QX shuffled 12 12 12

POK 091 plusmn 004 017 plusmn 003 017 plusmn 008

POK shuffled 12 12

BA model 12

CP process rarr 12

100

101

102

103

104

total number of messages

10minus10

10minus9

10minus8

10minus7

10minus6

10minus5

10minus4

10minus3

10minus2

10minus1

100

p(M

L)

100

101

102

103

10410

minus1010

minus910

minus810

minus710

minus610

minus510

minus410

minus310

minus210

minus110

0

p(M)p(K)

100

101

102

103

104

105

total number of messages

100

101

102

103

104

105

(a) QX send (b) POK send

(c) QX directed link (d) POK directed link

~ ML

minus35

~ Mminus12

~ Mminus3

~ Kminus127

~ Kminus37

~ ML

minus35

~ Kminus073

~ Mminus08

Fig 16 (Color online) Probability densities of activities anddegrees The probabilities are plotted versus the total numberof messages M and the final degree K for (a) sending inQX and (b) sending in POK The panel (c) and (d) exhibitthe probability densities of the total number of messages alongdirected links ML for QX and POK respectively The dottedlines serve as guides to the eye and have the indicated slopes

obtained curves However λ asymp 075 is relatively close to 1so that the differences are minor

The probability densities of activity along direct linksare displayed in Figures 16c and 16d for QX and POKrespectively In both cases the frequency of large activitydecays approximately following a power-law with expo-nent around 35

4 Conclusions

Our work reviews and further supports previous empir-ical findings [12] extending them by some features Theobtained exponents are summarized in Table 1

In addition to [12] we find very similar characteristicsfor the passivity of receiving as for the activity of send-ing messages This is in line with the strong correlations

between individual sending and receiving ie most of themessages are somehow replied sooner or later Further-more already the communication between two individualscomprises long-term persistence

Investigating the probability densities of logarithmicgrowth rates (ie growth of the cumulative number of mes-sages between two time steps of any member) we are ableto collapse the curves by scaling them with conditional av-erage growth rates and conditional standard deviationsWhile less active members follow well the exponentiallydecaying probability density for the more active membersdeviations are found in the case of large growth rates

Moreover we introduce a new growth rate namely themutual growth in the number of messages This is the dif-ference in the number of messages sent between pairs ofmembers at two time steps The conditional standard de-viation of this mutual growth rate also decays as a power-law with increasing initial difference whereas the expo-nent is close to 03 and changes to 12 when the datais shuffled We conjecture that this growth reflects cross-correlations in the activity

Finally we propose simulations to reproduce the long-term correlations and growth properties Basically it con-sists of generating long-term correlated sequences anddefining a threshold All values of such sequences abovethe threshold (POT) represent a message event We showthat then the correlation and growth features being deter-mined by the imposed fluctuation exponent confirm therelation β = 1 minus H [12] Including further features thisapproach could be a starting point for more elaboratedmodeling of human dynamics

We would like to note that ndash except Section 322 aboutmutual growth in the number of messages (and Sect 34) ndashall analysis and results refer to auto-correlations As phe-nomena auto- and cross-correlations can occur indepen-dently However since most of the messages are replied itis very likely that there are also cross-correlations betweenthe members activity which to our knowledge has not yetbeen studied systematically

Thus our work opens perspectives for further researchactivities In particular the origin of the long-term persis-tence in the communication remains an important ques-tion In [14] we demonstrate the relation of β H withinter-event time scaling From a psychologicalsociological

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 12: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

158 The European Physical Journal B

point of view one may argue where the persistence isoriginated Is it purely due to a state of mind solipsis-tic emerging from moods or is it due to social effects iethat the dynamics in the social network induces persis-tent fluctuations One hypothesis could be that alreadythe social network is correlated [68]

We thank C Briscoe JF Eichner LK Gallos and HDRozenfeld for useful discussions This work was supportedby National Science Foundation Grant NSF-SES-0624116 andNSF-EF-0827508 FL acknowledges financial support fromThe Swedish Bank Tercentenary Foundation SH thanks theEuropean EPIWORK project the Israel Science Foundationthe ONR and the DTRA for financial support

References

1 MHR Stanley LAN Amaral SV Buldyrev S HavlinH Leschhorn P Maass MA Salinger HE StanleyNature 379 804 (1996)

2 D Canning LAN Amaral Y Lee M Meyer HEStanley Econ Lett 60 335 (1998)

3 V Plerou LAN Amaral P Gopikrishnan M MeyerHE Stanley Phys Rev E 60 6519 (1999)

4 F Liljeros LAN Amaral HE Stanley arXiv

nlin0310001v1[nlinAO] (2003) httparxivorg

absnlin0310001

5 K Matia LAN Amaral M Luwel HF Moed HEStanley J Am Soc Inf Sci Tec 56 893 (2005)

6 S Picoli RS Mendes Phys Rev E 77 036105 (2008)7 HD Rozenfeld D Rybski JS Andrade Jr M Batty

HE Stanley HA Makse Proc Natl Acad Sci USA 10518702 (2008)

8 F Wang K Yamasaki S Havlin HE Stanley arXiv09114258v1[q-finST] (2009) httparxivorgabs09114258

9 R Gibrat Les inegalites economiques (Libraire du RecueilSierey Paris 1931)

10 J Sutton J Econ Lit 35 40 (1997)11 M Mitzenmacher Internet Math 1 226 (2004)12 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse Proc Natl Acad Sci USA 106 12640 (2009)13 AL Barabasi Nature 435 207 (2005)14 D Rybski SV Buldyrev S Havlin F Liljeros HA

Makse preprint (2011)15 M Karsai M Kivela RK Pan K Kaski J Kertesz AL

Barabasi J Saramaki Phys Rev E 83 025102 (2011)16 J Brujic RI Hermans KA Walther JM Fernandez

Nature Phys 2 282 (2006)17 LK Gallos D Rybski F Liljeros S Havlin HA Makse

submitted (2011)18 P Holme Europhys Lett 64 427 (2003)19 P Holme F Liljeros CR Edling BJ Kim Phys Rev

E 68 056107 (2003)20 P Holme CR Edling F Liljeros Soc Networks 26 155

(2004)21 CK Peng SV Buldyrev S Havlin M Simons HE

Stanley AL Goldberger Phys Rev E 49 1685 (1994)22 A Bunde S Havlin JW Kantelhardt T Penzel JH

Peter K Voigt Phys Rev Lett 85 3736 (2000)

23 JW Kantelhardt E Koscielny-Bunde HHA RegoS Havlin A Bunde Physica A 295 441 (2001)

24 JW Kantelhardt Encyclopedia of Complexity and SystemScience (Springer 2009) Chap entry 00620 Fractal andMultifractal Time Series

25 S Golder DM Wilkinson BA Huberman Communitiesand Technologies 2007 (Springer London 2007) pp 41ndash66

26 J Leskovec E Horvitz arXiv08030939v1[physics

soc-ph] (2008) httparxivorgabs0803093927 RD Malmgren DB Stouffer AE Motter LAN

Amaral Proc Natl Acad Sci USA 105 18153 (2008)28 A Vazquez Physica A 373 747 (2007)29 Z Eisler J Kertesz Phys Rev E 73 046109 (2006)30 Z Eisler I Bartos J Kertesz Adv Phys 57 89 (2008)31 LEC Rocha F Liljeros P Holme Proc Natl Acad Sci

USA 107 5706 (2010)32 CK Peng J Mietus JM Hausdorff S Havlin HE

Stanley AL Goldberger Phys Rev Lett 70 1343 (1993)33 PC Ivanov A Bunde LAN Amaral S Havlin

J Fritsch-Yelle RM Baevsky HE Stanley ALGoldberger Europhys Lett 48 594 (1999)

34 K Kosmidis A Kalampokis P Argyrakis Physica A 370808 (2006)

35 Y Liu P Gopikrishnan P Cizeau M Meyer CK PengHE Stanley Phys Rev E 60 1390 (1999)

36 RN Mantegna HE Stanley An Introduction toEconophysics Correlations and Complexity in Finance(Cambridge University Press Cambridge 1999)

37 F Lux M Ausloos The Science of Disasters in MarketFluctuations I Scaling Multiscaling and Their PossibleOrigins (Springer-Verlag Berlin 2002) Chap 13 pp 373ndash409

38 WE Leland MS Taqqu W Willinger DV WilsonIEEEACM Trans Networking 2 1 (1994)

39 M Kampf S Tismer JW Kantelhardt L Muchnik sub-mitted (2011)

40 S Tadaki M Kikuchi A Nakayama K NishinariA Shibata Y Sugiyama S Yukawa J Phys Soc Jpn75 034002 (2006)

41 Z Xiao-Yan L Zong-Hua T Ming Chinese Phys Lett24 2142 (2007)

42 K Linkenkaer-Hansen VV Nikouline JM Palva RJIlmoniemi J Neurosci 21 1370 (2001)

43 P Allegrini D Menicucci R Bedini L FronzoniA Gemignani P Grigolini BJ West P Paradisi PhysRev E 80 061914 (2009)

44 PC Ivanov K Hu MF Hilton SA Shea HE StanleyProc Natl Acad Sci USA 104 20702 (2007)

45 RD Malmgren DB Stouffer ASLO CampanharoLAN Amaral Science 325 1696 (2009)

46 CA Hidalgo C Rodriguez-Sickert Physica A 387 3017(2008)

47 A Saichev Y Malevergne D Sornette Theory ofZipf rsquos Law and Beyond ndash Monograph Lecture Notes inEconomics and Mathematical Systems (Springer-VerlagBerlin 2009)

48 LAN Amaral SV Buldyrev S Havlin MA SalingerHE Stanley Phys Rev Lett 80 1385 (1998)

49 HD Rozenfeld D Rybski X Gabaix HA Makse AmEcon Rev 101 2205 (2011)

50 AL Barabasi R Albert Science 286 509 (1999)51 P Hedstrom Dissecting the Social On the Principles

of Analytical Sociology (Cambridge University PressCambridge 2005)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References
Page 13: Communication activity in social networks: growth and ...hmakse/maksepapers/2011... · data of the second online community (. com, POK) covers 492 days (February 2001 until June 2002)

D Rybski et al Communication activity in social networks growth and correlations 159

52 A Kentsis Nature 441 E5 (2006)53 G Palla AL Barabasi T Vicsek Nature 446 664 (2007)54 R Crane D Sornette Proc Natl Acad Sci USA 105

15649 (2008)55 A Bunde JF Eichner JW Kantelhardt S Havlin

Phys Rev Lett 94 048701 (2005)56 EG Altmann H Kantz Phys Rev E 71 056106 (2005)57 JF Eichner JW Kantelhardt A Bunde S Havlin

Phys Rev E 75 011128 (2007)58 HA Makse S Havlin M Schwartz HE Stanley Phys

Rev E 53 5445 (1996)59 Y Xu QDY Ma DT Schmitt P Bernaola-Galvan

PC Ivanov Physica A 390 4057 (2011)60 J Stehle A Barrat G Bianconi Phys Rev E 81 035101

(2010)

61 R Albert AL Barabasi Rev Mod Phys 74 47(2002)

62 H Ebel LI Mielsch S Bornholdt Phys Rev E 66035103 (2002)

63 MEJ Newman S Forrest J Balthrop Phys Rev E 66035101 (2002)

64 G Bianconi AL Barabasi Europhys Lett 54 436 (2001)65 BF de Blasio A Svensson F Liljeros Proc Natl Acad

Sci USA 104 10762 (2007)66 R Cohen S Havlin Complex Networks Structure

Robustness and Function in Growing network models(Cambridge University Press Cambridge 2009)

67 L Muchnik HA Makse S Havlin preprint (2011)68 D Rybski HD Rozenfeld JP Kropp Europhys Lett

90 28002 (2010)

  • Introduction
  • Data
  • Analysis
  • Conclusions
  • References