social connectome

43
Social Connectome Hang-Hyun Jo Dept. of Physics, Pohang University of Science and Technology, Republic of Korea Dept. of Computer Science, Aalto University School of Science, Finland

Upload: hang-hyun-jo

Post on 13-Apr-2017

212 views

Category:

Science


1 download

TRANSCRIPT

Page 1: Social Connectome

Social Connectome

Hang-Hyun JoDept. of Physics, Pohang University of Science and Technology, Republic of Korea

Dept. of Computer Science, Aalto University School of Science, Finland

Page 2: Social Connectome

Outline• Mobile phone data for temporal social networks

• Community structure and bursty dynamics

• Interaction is contextual!

• Overlapping communities and contextual bursts

• Demographic and geographic analysis

• Towards Social Connectome

Page 3: Social Connectome

Research questions

Page 4: Social Connectome

Q1: What does the social network look like?

Q2: What drives the evolution of the social network?

(from a physicist’s viewpoint…)

Page 5: Social Connectome

A physicist’s viewpoint

• More interested in the universal patterns than in the details (But, the devil is in the detail…)

• Apply and extend physics to solve the problems derived from social phenomena

Page 6: Social Connectome

Why mobile phone data?

• Mobile phones carried by people almost always

• Almost 100% of coverage in many countries

• Good proxy of the real social networks

Page 7: Social Connectome

Mobile phone data

• Source: A European operator

• Time-resolved call/SMS events among several millions of mobile phone users

• Topology = communication network

• Dynamics = temporal patterns of communication

Page 8: Social Connectome

Topology of communication

Page 9: Social Connectome

Community structure Onnela et al., PNAS, NJP (2007)

conversation typically represents a one-to-one communication.The tie strength distribution is broad (Fig. 1B), however, decay-ing with an exponent !w ! 1.9, so that although the majority ofties correspond to a few minutes of airtime, a small fraction ofusers spend hours chatting with each other. This finding is ratherunexpected, given that fat-tailed tie strength distributions havebeen observed mainly in networks characterized by global trans-port processes, such as the number of passengers carried by theairline transportation network (11), the reaction fluxes in met-abolic networks (12), or packet transfer on the Internet (13), inwhich case the individual f luxes are determined by the globalnetwork topology. An important feature of such global f lowprocesses is local conservation: All passengers arriving to anairport need to be transported away, each molecule created bya reaction needs to be consumed by some other reaction, or eachpacket arriving to a router needs to be sent to other routers.Although the main purpose of the phone is information transferbetween two individuals, such local conservation that constrainsor drives the tie strengths are largely absent, making anyrelationship between the topology of the MCG and local tiestrengths less than obvious.

Complex networks often organize themselves according to aglobal efficiency principle, meaning that the tie strengths areoptimized to maximize the overall f low in the network (13, 14).In this case the weight of a link should correlate with itsbetweenness centrality, which is proportional to the number ofshortest paths between all pairs of nodes passing through it (refs.13, 15, and 16, and S. Valverde and R. V. Sole, unpublishedwork). Another possibility is that the strength of a particular tiedepends only on the nature of the relationship between two

individuals and is thus independent of the network surroundingthe tie (dyadic hypothesis). Finally, the much studied strength ofweak ties hypothesis (17–19) states that the strength of a tiebetween A and B increases with the overlap of their friendshipcircles, resulting in the importance of weak ties in connectingcommunities. The hypothesis leads to high betweenness central-ity for weak links, which can be seen as the mirror image of theglobal efficiency principle.

In Fig. 2A, we show the network in the vicinity of a randomlyselected individual, where the link color corresponds to thestrength of each tie. It appears from this figure that the networkconsists of small local clusters, typically grouped around ahigh-degree individual. Consistent with the strength of weak tieshypothesis, the majority of the strong ties are found within theclusters, indicating that users spend most of their on-air timetalking to members of their immediate circle of friends. Incontrast, most links connecting different communities are visibly

100 101 10210

10 6

10 4

10 2

100

100 102 104 106 10810

10 12

10 10

10 8

10 6

10 4

10 2

vi vj

Oij=0 Oij=1/3

Oij=1Oij=2/3

A B

<O

> w,

<O

> b

0 0.2 0.4 0.6 0.8 10

0.05

0.1

0.15

0.2

Pcum (w), Pcum(b)

C DDegree k Link weight w (s)

P(k

)

P(w

)

Fig. 1. Characterizing the large-scale structure and the tie strengths of themobile call graph. (A and B) Vertex degree (A) and tie strength distribution (B).Each distribution was fitted with P(x) ! a(x " x0)#x exp(#x/xc), shown as a bluecurve, where x corresponds to either k or w. The parameter values for the fitsare k0 ! 10.9, !k ! 8.4, kc ! $ (A, degree), and w0 ! 280, !w ! 1.9, wc ! 3.45 %105 (B, weight). (C) Illustration of the overlap between two nodes, vi and vj, itsvalue being shown for four local network configurations. (D) In the realnetwork, the overlap &O'w (blue circles) increases as a function of cumulativetie strength Pcum(w), representing the fraction of links with tie strengthsmaller than w. The dyadic hypothesis is tested by randomly permuting theweights, which removes the coupling between &O'w and w (red squares). Theoverlap &O'b decreases as a function of cumulative link betweenness centralityb (black diamonds).

A

B

C

1

10010

Fig. 2. The structure of the MCG around a randomly chosen individual. Eachlink represents mutual calls between the two users, and all nodes are shownthat are at distance less than six from the selected user, marked by a circle inthe center. (A) The real tie strengths, observed in the call logs, defined as theaggregate call duration in minutes (see color bar). (B) The dyadic hypothesissuggests that the tie strength depends only on the relationship between thetwo individuals. To illustrate the tie strength distribution in this case, werandomly permuted tie strengths for the sample in A. (C) The weight of thelinks assigned on the basis of their betweenness centrality bij values forthe sample in A as suggested by the global efficiency principle. In this case, thelinks connecting communities have high bij values (red), whereas the linkswithin the communities have low bij values (green).

.

Onnela et al. PNAS ! May 1, 2007 ! vol. 104 ! no. 18 ! 7333

APP

LIED

PHYS

ICA

LSC

IEN

CES

strong ties in community

weak ties between communities

Page 10: Social Connectome

The Strength of Weak Ties

E

c~~~~~~~~~~~~~~~~

D AH G

D/

C~ A B X

E (b)

FIG. 2.-Local bridges. a, Degree 3; b, Degree 13. = strong tie; weak tie.

v because of costs or distortions entailed in each act of transmission. If v does not lie within this critical distance, then he will not receive messages originating with u" (1965, p. 159). I will refer to a tie as a "local bridge of degree n" if n represents the shortest path between its two points (other than itself), and n > 2. In figure 2a, A-B is a local bridge of degree 3, in 2b, of degree 13. As with bridges in a highway system, a local bridge in a social network will be more significant as a connection between two sectors to the extent that it is the only alternative for many people-that is, as its degree increases. A bridge in the absolute sense is a local one of infinite degree. By the same logic used above, only weak ties may be local bridges.

Suppose, now, that we adopt Davis's suggestion that "in interpersonal flows of most any sort the probability that 'whatever it is' will flow from person i to person j is (a) directly proportional to the number of all-positive (friendship) paths connecting i and j; and (b) inversely proportional to the length of such paths" (1969, p. 549).8 The significance of weak ties, then, would be that those which are local bridges create more, and shorter, paths. Any given tie may, hypothetically, be removed from a network; the number of paths broken and the changes in average path length resulting

8 Though this assumption seems plausible, it is by no means self-evident. Surprisingly little empirical evidence exists to support or refute it.

1365

Strength of weak ties Granovetter, AJS (1973)

strong tie

weak tie

Page 11: Social Connectome

Modeling communities Kumpula et al., PRL (2007)

global attachment

local attachment preferential

reinforcement

global attachment

reset node

Page 12: Social Connectome

Dynamics of communication

Page 13: Social Connectome

Bursty communicationoutgoing calls →

incoming calls →

burst

inter-event time τ

Karsai et al., PRE (2011)

RAPID COMMUNICATIONS

SMALL BUT SLOW WORLD: HOW NETWORK TOPOLOGY. . . PHYSICAL REVIEW E 83, 025102(R) (2011)

FIG. 2. (Color online) Spreading dynamics in the Reality Mining(left) and email networks (right), for the original event sequence (!)and null models: DCW (!) and DCWB ("). In the email network,the spreading process is directed. The maximum prevalence is limitedto the total fraction of the SCC and the OUT component (∼85%).

∼21 days, i.e., a factor "2. Similarly for the 100% prevalencethis factor also "2 (∼342 d), showing that the effects ofcorrelations are consistent for the duration of the whole processand for individual runs. As for the effect of the random initialconditions, the small error of mean values in Table I showthat the mean curves (Fig. 1) characterize the overall behaviorwell. The effect of initial conditions are demonstrated in Fig. 1,where the distributions are clearly separable at full prevalence.

Results for the Reality Mining mobile call network and forthe email logs are shown in Fig. 2, with the DCW and DCWBnull models; the outcome is qualitatively similar with that ofMCN. However, there are certain differences. In the small andsparse RM network, successive calls to many people within ashort time period by a hub give rise to a steep prevalence rise.Such behavior is a one-off event and the effect is destroyedin the null models. In the email network, very high-degreehubs sending frequent emails give rise to rapid spreadingonce they are reached. This effect is conserved in the nullmodels.

The daily activity pattern, i.e., variation in overall commu-nication frequency by the hour, is retained in every null modelthat is based on randomizing the original event sequence.In [21], it was suggested that natural periodicities, such asthe daily cycle, are responsible for the fat-tailed waiting timedistributions. In order to evaluate the impact of the daily patternon the spreading speed, we carried out simulations wherethe aggregated MCN was used as the network. Events weregenerated on its links by two Poisson processes that conservelink weights: a homogeneous Poisson process, and a processwhose instantaneous rate follows the daily pattern as calculatedfrom the call statistics on hourly basis (see inset in Fig. 3). TheSI dynamics for both cases are shown in Fig. 3. The differencebetween the two curves is negligible, demonstrating that thedaily pattern has only a minor impact on the spreading speed.This, together with the observation that temporal correlationsdo have a significant decelerating effect on spreading stronglyindicates that there are important, non-Poissonian correlationsin the system beside the daily type cycles.

The non-Poissonian, bursty character of event sequences isclearly demonstrated by the fat-tailed distribution of single-link interevent times for the MCN, see Fig. 4. In order to

FIG. 3. (Color online) Spreading dynamics as obtained from aPoissonian event-generating model on the aggregated MCN, withdaily pattern (!) and without (#). Link weights were taken intoaccount and the curve with the daily pattern is comparable with theDCW null model. Inset: the average daily pattern as observed for theMCN event sequence with binning by the hour. The continuous lineis to guide the eye.

exclude the possibility that the fat tail in the interevent timedistribution is only due to the broad weight distribution assuggested in [21], we calculated the distributions for binnedweights and obtained satisfactory scaling with the averageinterevent time, same as [17]. We find that the distribution canbe fitted by a power law with exponent 0.7 over 3.5 decades,followed by a fast decay. The scaling breaks down for smallinterevent times, where a peak in the distribution at ∼20 s isfound, which is due to event correlations between links. Thepower law indicates non-Poissonian bursty character of theevents. Both the characteristics vanish for the time-shufflednull model and the interevent time is well described by an

FIG. 4. (Color online) Scaled interevent time distributions forthe MCN. Edges were log-binned by weight and for every secondbin the interevent time distribution of the events occurring in thecorresponding bin is shown, scaled by the average interevent time ofthat bin τ ∗ (larger τ ∗ darker the color). Inset: scaled inter-event timedistributions for the original (!) and for the time-shuffled events (!).An exponential density distribution with average value of 1 is shownas a light (yellow) line.

025102-3

P (⌧) ⇠ ⌧�↵

Page 14: Social Connectome

Origin of bursts?

Page 15: Social Connectome

Why? Priority queuing Barabási, Nature (2005)

e-mail

time

prio

rity

smallwaiting

time

large waiting

time

waiting time

Page 16: Social Connectome

Cyclic Poisson process Malmgren et al., PNAS (2008)

time-varying rate with weekly cycle for e-mail usage

heavy tail of inter-event time distribution

Question: Are weekly cycles the ONLY reason for bursts?

Page 17: Social Connectome

De-seasoning cycles? Jo et al., NJP (2012)

mobile call sequence of one user

: weekly cycle (T=7 days)⇢(t)

: no cyclic patterns⇢⇤(t⇤) = 1

de-seasoned by weekly cycle

B7 = 0.146

: burstiness parameter Goh & Barabási, EPL (2008)

B0 = 0.224

B =� �m

� +m

Page 18: Social Connectome

Bursts are robust!

Burstiness remains finite after de-seasoning weekly cycles.

burs

tines

s

de-seasoning period (days)

different activity group

Page 19: Social Connectome

strong link weak link

Temporal networkstim

e

event

inte

r-eve

nt ti

me

Page 20: Social Connectome

Social Connectome = a comprehensive map of

social interaction

Jo (in preparation)

Page 21: Social Connectome

Social Connectome

topological scale

dyna

mic

al re

solu

tion

individual

communitysocie

ty

aggregate

burst

seasonality

temporal motif

layered structure strength of weak ties

event

egocentric net

contextual bursts (CB)

circadian cycleoverlapping communities

(OC)

OC+CB

Page 22: Social Connectome

Unified frame for communities and bursts?

Page 23: Social Connectome

Triad chain interaction Jo et al., PLOS ONE (2011)

Page 24: Social Connectome

OR model

AND model

original without links with w=1

Page 25: Social Connectome

respectively, both of which are close to the empirical value 0:7 ofMPC dataset within error bars. In all cases, the values of a aresmaller for larger values of p

LAbut are barely affected by the value

of pGA

. The values of tc turn out to be larger for larger values ofp

LAand for smaller values of p

GA. The maximum value of tc is

around 50.To figure out what are the possible underlying mechanism for

these findings, we first identify the triangular chain interaction(TCI) among three neighboring nodes, say i, j, and k: Both theevent between nodes i and j at time step t{2 and the eventbetween nodes j and k at time t{1 lead to an event betweennodes i and k at time t, again leading to another event betweennodes i and j at time tz1 and so on, unless interrupted either bythe events from/to nodes outside the triangle or by a randommemory loss of nodes in the triangle. Since the TCI is exclusivedue to the priority of the triad interaction including the LAprocess, the LA process enhanced by the large value of p

LAinhibits

the interruption by the events from/to nodes outside the triangle,including the GA process, and thus making the communitystructure more compact in turn resulting in a smaller averagedegree. In case of TI-OR model with p

GA~0:1, SkT&10:1 or 4:2

for pLA

~0:013 or 0:1, respectively. While the compact communitystructure enhances the TCI again, explaining the observed peaks

of P(t) at t~1 and 2, it can also make some neighbors of the TCInodes wait for long time to interact with the TCI nodes. Hence,the larger value of p

LAgives rise to larger fluctuation in the inter-

event times, implying a smaller value of the power-law exponent aand a larger value of the cutoff tc, as observed. Based on thisargument, the effect of p

LAdominates over that of p

GA, so that the

value of pGA

barely affects the scaling of inter-event timedistributions but it controls the value of tc. The larger value ofp

GAallows nodes to choose a random target and thus interrupt the

inter-event times of targets more frequently, leading to a smallervalue of tc. The numerical results in the case of the TI-ANDmodel can be explained by the same arguments, except for theobserved values of a less than those found in the case of the TI-ORmodel. Note that in general the AND protocol inhibits thepossibility of events.

The heavy tailed distribution of inter-event times, i.e. burstydynamics, was not expected but it emerged from the model.Analogously with the task execution model suggested by Barabasi[6], the dyad NI process can be interpreted such that a node i hasthe task list with size ki and it selects one of neighbors (tasks) j withprobability proportional to the priority of the task, i.e. the weightwij in our model. The degree ki also varies depending on the linkcreation and deletion processes. A node having been isolated by

Figure 2. TI-OR model. A. The cumulative weight distribution Pc(w). B. The average number of next nearest neighbors knn(k). C. The averageoverlap O(w). D. The local clustering coefficient c(k). E. The inter-event time distribution P(t). F. The average strength s(k). Results are averaged over50 realizations for networks with N~5|104 and p

ML~10{3 . We obtain SkT&10:1 and ScT&0:08 for p

LA~0:013 and p

GA~0:1. The cases with

pLA

~0:1 and/or with pGA

~0:07 are also plotted for comparison.doi:10.1371/journal.pone.0022687.g002

Bursts and Communities in Evolving Networks

PLoS ONE | www.plosone.org 5 August 2011 | Volume 6 | Issue 8 | e22687

behaviors are also observed when the overlap is used instead of theweight in the link percolation analysis. For the weak-link-first-removal we find fc~0:79 and fmax~0:55, where yet another kinkin the curve of RGC is observed. This implies that the network goesthrough two abrupt changes, first at fmax and then at fc.

Here we also observe the heavy tailed distributions of inter-event times with exponential cutoffs following a power lawbehavior with the exponent of a&1:1. The task execution modelfor each node would result in a~1 as in the case of Barabasi’squeuing model if only initiating the I-tasks are counted as therelevant events and if the neighbors of the node always respond tothat node. However, the nodes are supposed to interact with eachother such that by initiating I -tasks some root nodes can interruptthe inactive periods of their target nodes, which in generaldecreases the inter-event times. On the other hand, if the target isalready involved in another event so that the trials by the root

nodes fail, the inter-event times of corresponding root nodes wouldincrease up to the points of next successful events occurring. Theobserved value of a&1:1 indicates that any of the mentionedfactors did not affect much the scaling behavior of thedistributions. The values of tc are largely or barely affected bythe value of p

GAor p

LA, respectively, in an anti-correlated way. The

maximum value of tc is around 270.The observation of the average strength s(k)*k behavior can be

explained by considering the dynamics where the OR protocol isadopted. In this case the nodes with many neighbors might receivemore calls from their neighbors than those with few neighbors do,while the chance to make calls is the same for any node.

ConclusionsWe have studied the emergence of Granovetter-type commu-

nity structure, characterized by the increasing behavior of overlap

Figure 5. Link percolation analysis. A, B. TI-OR model. C, D. TI-AND model. E, F. PE model. As the link strength, we use the weight (left panel)and the overlap (right panel). For each panel, we calculate the fraction of giant component RGC , susceptibility x, and clustering coefficient ScT (inset)as a function of the fraction of removed links, f . Results are averaged over 50 realizations for networks originally with SkT&10 for each model.doi:10.1371/journal.pone.0022687.g005

Bursts and Communities in Evolving Networks

PLoS ONE | www.plosone.org 8 August 2011 | Volume 6 | Issue 8 | e22687

the memory loss tries to interact with strangers. Once beingconnected to some other node by the GA process, its degreeincreases partly by means of the LA process but it will not diverge.The degree mostly fluctuates and sometimes remains unchangedfor long periods of time. And the node finally becomes isolatedagain by the memory loss. Thus, the whole life-cycle of a node isassumed to consist of two types of periods, i.e. one with fixed-sizeand the other with variable-size task list. The periods of fixed-sizetask list, i.e. fixed degrees, are up to several hundred time steps,which are much larger than the observed tc. This implies thenatural separation of timescales between network change anddynamics on the network, which is consistent with everydayexperience of mobile phone usage. Due to the timescale separationthe inter-event time distribution for the whole period can berepresented by the superposition of those for fixed-size period andfor variable-size period. Thus, to understand the effect of sizevariability on the scaling behavior of bursty dynamics, we refer tothe previous works studied in the different kinds of models, such asby Vazquez et al. [22]. When the task list has a variable (fixed) sizein the Barabasi model, the power-law exponent for the waitingtime distribution turns out to be 3=2 (2). According to theargument that the distribution of the inter-event times derivedfrom the waiting times has the same power-law exponent as that of

the waiting times, one can expect the similar values of exponentfrom our model. However, this is not the case with our model, sowe leave this for the more rigorous analysis in the future.

Finally, the apparent overall independence of the averagestrength s(k) on k for large values of k is attributed to the fact thatonce the node is a member of the TCI, its activity becomeseffectively independent of its degree due to the exclusive propertyof TCI. We observe even the decreasing behaviors of s(k) for thelarger k values in the TI-AND model, i.e. the AND protocol basedinteraction with too many neighbors can make nodes failing tointeract with any neighbors.

Process-Equalized modelThe TI models show the expected behaviors of Granovetter-

type community structure and the heavy tailed inter-event timedistribution but they do not yield the expected behavior of thelocal clustering coefficient and average strength of the nodes. Thisis mainly due to too strong effect of the triad interaction and that iswhy we need to consider the PE model for modeling improvementand comparison with empirical results.

With the PE model we find that the cumulative weightdistributions Pc(w) are broad, that the overlap O(w) increaseswith w, i.e. showing Granovetter-type community structure, that

Figure 3. TI-AND model. A. The cumulative weight distribution Pc(w). B. The average number of next nearest neighbors knn(k). C. The averageoverlap O(w). D. The local clustering coefficient c(k). E. The inter-event time distribution P(t). F. The average strength s(k). Results are averaged over50 realizations for networks with N~5|104 and pML ~10{3 . We obtain SkT&9:6 and ScT&0:13 for pLA ~0:07 and pGA ~0:1. The cases with pLA ~0:4and/or with p

GA~0:04 are also plotted for comparison.

doi:10.1371/journal.pone.0022687.g003

Bursts and Communities in Evolving Networks

PLoS ONE | www.plosone.org 6 August 2011 | Volume 6 | Issue 8 | e22687

behaviors are also observed when the overlap is used instead of theweight in the link percolation analysis. For the weak-link-first-removal we find fc~0:79 and fmax~0:55, where yet another kinkin the curve of RGC is observed. This implies that the network goesthrough two abrupt changes, first at fmax and then at fc.

Here we also observe the heavy tailed distributions of inter-event times with exponential cutoffs following a power lawbehavior with the exponent of a&1:1. The task execution modelfor each node would result in a~1 as in the case of Barabasi’squeuing model if only initiating the I-tasks are counted as therelevant events and if the neighbors of the node always respond tothat node. However, the nodes are supposed to interact with eachother such that by initiating I -tasks some root nodes can interruptthe inactive periods of their target nodes, which in generaldecreases the inter-event times. On the other hand, if the target isalready involved in another event so that the trials by the root

nodes fail, the inter-event times of corresponding root nodes wouldincrease up to the points of next successful events occurring. Theobserved value of a&1:1 indicates that any of the mentionedfactors did not affect much the scaling behavior of thedistributions. The values of tc are largely or barely affected bythe value of p

GAor p

LA, respectively, in an anti-correlated way. The

maximum value of tc is around 270.The observation of the average strength s(k)*k behavior can be

explained by considering the dynamics where the OR protocol isadopted. In this case the nodes with many neighbors might receivemore calls from their neighbors than those with few neighbors do,while the chance to make calls is the same for any node.

ConclusionsWe have studied the emergence of Granovetter-type commu-

nity structure, characterized by the increasing behavior of overlap

Figure 5. Link percolation analysis. A, B. TI-OR model. C, D. TI-AND model. E, F. PE model. As the link strength, we use the weight (left panel)and the overlap (right panel). For each panel, we calculate the fraction of giant component RGC , susceptibility x, and clustering coefficient ScT (inset)as a function of the fraction of removed links, f . Results are averaged over 50 realizations for networks originally with SkT&10 for each model.doi:10.1371/journal.pone.0022687.g005

Bursts and Communities in Evolving Networks

PLoS ONE | www.plosone.org 8 August 2011 | Volume 6 | Issue 8 | e22687

OR model AND model

Page 26: Social Connectome

Summary• Topology of communication

• Granovetter: Strength of weak ties

• Kumpula’s model with global/local attachments

• Dynamics of communication

• Bursts of events

Page 27: Social Connectome

Not all events are equal.

Events are contextual!

Page 28: Social Connectome

OFFICE

HOME

MATRIX

???

Jo et al., EPJ DS (2012)

Page 29: Social Connectome

Time-ordering

Jo et al. EPJ Data Science 2012, 1:10 Page 13 of 18http://www.epjdatascience.com/content/1/1/10

Figure 11 Time-ordering behavior between services. (a) Distributions of time interval !tss′ betweenconsecutive events of different services s and s′ . (b) Diagram for time-ordering behavior between servicesbased on the distributions of time interval.

Table 1 k-means clustering results for weekly patterns of service usages

Service q = 0 Ns

web 74 9 7 6 5 3 3 2 1 1 111app 50 32 10 7 6 6 5 4 3 1 124email 55 3 3 2 1 1 1 1 1 1 69call 54 40 14 5 4 1 1 1 1 1 122SMS 74 14 11 9 5 4 3 1 1 1 123avg 64 21 16 6 5 5 4 1 1 1 124

We summarize k-means clustering results for weekly patterns of service usages with k = 10. q and Ns denote the cluster indexand the number of available users for service s, respectively.

here we present the result maximizing the quality of clustering or validity index, definedas the minimum inter-cluster distance divided by the sum of intra-cluster distances [].

The clustering results are summarized in Table and only a few weekly patterns of dom-inant clusters are shown in Figure . Only one dominant cluster is found in each case ofweb and email usages, implying similar patterns among users. Weekly patterns of app,call, and SMS usages are clustered into more than one dominant cluster. Compared to thelargest cluster (q = ) of call usage, the second largest cluster (q = ) can be characterizedby larger activities in the weekday daytime and in the weekend morning. The behavioraldifference between dominant clusters in SMS usage is also obvious. The largest cluster

Jo et al. EPJ Data Science 2012, 1:10 Page 13 of 18http://www.epjdatascience.com/content/1/1/10

Figure 11 Time-ordering behavior between services. (a) Distributions of time interval !tss′ betweenconsecutive events of different services s and s′ . (b) Diagram for time-ordering behavior between servicesbased on the distributions of time interval.

Table 1 k-means clustering results for weekly patterns of service usages

Service q = 0 Ns

web 74 9 7 6 5 3 3 2 1 1 111app 50 32 10 7 6 6 5 4 3 1 124email 55 3 3 2 1 1 1 1 1 1 69call 54 40 14 5 4 1 1 1 1 1 122SMS 74 14 11 9 5 4 3 1 1 1 123avg 64 21 16 6 5 5 4 1 1 1 124

We summarize k-means clustering results for weekly patterns of service usages with k = 10. q and Ns denote the cluster indexand the number of available users for service s, respectively.

here we present the result maximizing the quality of clustering or validity index, definedas the minimum inter-cluster distance divided by the sum of intra-cluster distances [].

The clustering results are summarized in Table and only a few weekly patterns of dom-inant clusters are shown in Figure . Only one dominant cluster is found in each case ofweb and email usages, implying similar patterns among users. Weekly patterns of app,call, and SMS usages are clustered into more than one dominant cluster. Compared to thelargest cluster (q = ) of call usage, the second largest cluster (q = ) can be characterizedby larger activities in the weekday daytime and in the weekend morning. The behavioraldifference between dominant clusters in SMS usage is also obvious. The largest cluster

inter-event time between different services/contexts

communication services

non-communication services

Page 30: Social Connectome

Context in the topology

Page 31: Social Connectome

Overlapping community

Family Work

Alice

Bob

Family

Alice

Bob

Link communities

Work

Alice

Bob

Node communities

Spouses Alice and Bob also work togethera b

The Alice-Bob link was placed in family but both home and work relationships are identified

Ahn et al., Nature (2010)

It is instructive to examine further the statistics of link communitiesin the metabolic and mobile phone networks (Fig. 3). The communitysize distribution at the optimum value of D is heavy tailed for bothnetworks, whereas the number of communities per node distinguishesthem (Fig. 3, insets): Mobile phone users are limited to a smaller rangeof community memberships, most likely as a result of social and timeconstraints. Meanwhile, the membership distribution of the metabolicnetwork displays the universality of currency metabolites (water, ATPand so on) through the large number of communities they participatein. Notable previous work11,15 removed currency metabolites beforeidentifying meaningful community structure. The statistics presentedhere match current knowledge about the two systems, further con-firming the communities’ relevance.

Having established that link communities at the maximal partitiondensity are meaningful and relevant, we now show that the linkdendrogram reveals meaningful communities at different scales.Figure 4a–c shows that mobile phone users in a community arespatially co-located. Figure 4a maps the most likely geographic loca-tions of all users in the network; several cities are present. In Fig. 4b,we show (insets) several communities at different cuts above theoptimum threshold, revealing small, intra-city communities. Belowthe optimum threshold, larger, yet still spatially correlated, com-munities exist (Fig. 4c). Because we expect a tight-knit communityto have only small geographical dispersion, the clustered structureson the map indicate that the communities are meaningful. The geo-graphical correlation of each community does not suddenly breakdown, but is sustained over a wide range of thresholds. In Fig. 4d, welook more closely at the social network of the largest community inFig. 4c, extracting the structure of its largest subcommunity alongwith its remaining hierarchy and revealing the small-scale structuresencoded in the link dendrogram. This example provides evidence forthe presence of spatial, hierarchical organization at a societal scale. Tovalidate the hierarchical organization of communities quantitatively

throughout the dendrogram, we use a randomized control dendro-gram that quantifies how community quality would evolve if therewere no hierarchical organization beyond a certain point. Figure 4eshows that the quality of the actual communities decays much moreslowly than the control, indicating that real link dendrograms possessa large range of high quality community structures. The quantitativeresults of Fig. 4 are typical for the full test group, implying that rich,meaningful community structure is contained within the link den-drogram. Additional results supporting these conclusions are pre-sented in Supplementary Information, section 7.

Many cutting-edge networks are far from complete. For example,an ambitious project to map all protein–protein interactions in yeastis currently estimated to detect approximately 20% of connections14.As the rate of data collection continues to increase, networks become

Num

ber o

f com

mun

ities

Number of users per community

106

105

104

103

102

101

1000 5 10 15 20 25 30 35

Num

ber o

f use

rs

Number of communitiesper user

103

102

101

106

105

104

103

102

101

100

100

101 102 103

101 102 103

Num

ber o

f com

mun

ities

Number of metabolites per community

103

102

101

1000 50 100 150 200

Num

ber o

fm

etab

olite

s

Number of communitiesper metabolite

Mobile phone

Metabolic

H2O, H+

ATPADP

Pi

Figure 3 | Community and membership distributions for the metabolic andmobile phone networks. The distribution of community sizes and nodememberships (insets). Community size shows a heavy tail. The number ofmemberships per node is reasonable for both networks: we do not observephone users that belong to large numbers of communities and we correctlyidentify currency metabolites, such as water, ATP and inorganic phosphate(Pi), that are prevalently used throughout metabolism. The appearance ofcurrency metabolites in many metabolic reactions is naturally incorporatedinto link communities, whereas their presence hindered communityidentification in previous work11,15.

Threshold, t = 0.20

t = 0.24

t = 0.27

t = 0.27

50 km

a

0.4

D

0.6 0.8 1

d  Largest community Largestsubcommunity

Remaininghierarchy

t

e

b

c

Q/Q

max

0 0.2 0.4 0.6 0.8

Word association

0 0.2 0.4 0.6 0.8Link dendrogram threshold, t

Metabolic

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8

Phone

ActualControl

LargestcommunitySecondlargestThird largest

Figure 4 | Meaningful communities at multiple levels of the linkdendrogram. a–c, The social network of mobile phone users displays co-located, overlapping communities on multiple scales. a, Heat map of themost likely locations of all users in the region, showing several cities.b, Cutting the dendrogram above the optimum threshold yields small, intra-city communities (insets). c, Below the optimum threshold, the largestcommunities become spatially extended but still show correlation. d, Thesocial network within the largest community in c, with its largestsubcommunity highlighted. The highlighted subcommunity is shown alongwith its link dendrogram and partition density, D, as a function of threshold,t. Link colours correspond to dendrogram branches. e, Community quality,Q, as a function of dendrogram level, compared with random control(Methods).

NATURE | Vol 466 | 5 August 2010 LETTERS

763Macmillan Publishers Limited. All rights reserved©2010

mobile phone data

Page 32: Social Connectome

Multilayer social networks Murase, Jo et al., PRE (2014)

MULTILAYER WEIGHTED SOCIAL NETWORK MODEL PHYSICAL REVIEW E 90, 052810 (2014)

0

0.2

0.4

0.6

0.8

1.0

RLC

C

(a) L=1

asc.desc.

(b) L=2

asc.desc.

0

40

80

120

160

200

0 0.2 0.4 0.6 0.8 1

Susc

eptib

ility

f 0 0.2 0.4 0.6 0.8 1

f

FIG. 1. (Color online) Link percolation analysis for L = 1 (left)and L = 2 (right). The upper figures show the relative size of thelargest connected component, RLCC, as a function of the fraction ofthe removed links f . The lower figures show the susceptibility χ .Red solid (green dashed) lines correspond to the case when links areremoved in ascending (descending) order of the link weights. Theerror bars show standard errors.

and descending orders. For L = 1 we get "fc ≈ 0.35, whilefor L = 2 the figure shows that the percolation threshold forascending order f a

c is not significantly different from that fordescending order f d

c (i.e., "fc ≈ 0).The percolation thresholds for L = 2 are approximately the

same, fc ≈ 0.95, indicating that the introduction of a secondlayer destroys the Granovetterian structure. The percolationthreshold agrees well with that of an Erdos-Renyi (ER) randomnetwork having the same average degree ⟨k⟩ as the simulatedmodel: fc = 1 − 1/⟨k⟩ with the measured ⟨k⟩ = 21.9. (Notethat this is twice the average degree of a single layer.) Thisobservation shows that combining already two independentlayers from the original single-layer WSN model leads to ahigh level of randomization in the aggregate [29]. One maythink that the observed effect is due to the increasing totaldegree when two layers are merged. However, we carried outsimulations, where the total degree was controlled by p" andfound that for L = 2 the thresholds are always very close toeach other; "fc ≈ 0.

C. Copy-and-shuffle WSN model

Due to the fact that merging two layers of WSN modelsdestroys the Granovetterian structure, we investigated howthe correlation between layers affects the properties of thenetwork. We created the second layer by copying the firstlayer and then shuffled the fraction p of the nodes in thesecond layer. Shuffling nodes i and j means that all originallinks (i,k) become (j,k) and vice versa. This is just a relabelingof the nodes in that layer, meaning that the topology remainsthe same, i.e., both layers correspond to single-layer WSNmodels but with increasing p the correlations between themdecrease. This is called the “copy-and-shuffle” model (seeFig. 2).

(b) p=0.01

(c) p=0.1 (d) p=1

(a) p=0

FIG. 2. (Color) Snapshots of the copy-and-shuffle model withdifferent p shuffling parameter values and N = 300. Red (blue) linksare in the first (second) layer, and green links are in both layers.

When p = 0, the aggregate network is equivalent to thesingle-layer network whose link weights are doubled. Forp = 1, it is the same as the double-layer model; the Granovet-terian structure gets entirely destroyed by randomization. Bycontrolling p between 0 and 1, a transient behavior is observed.Figure 3 shows how the threshold values f a

c get closer to f dc as

p is increased and for p → 1 we get "fc → 0. The reason isthat strong links in the second layer connect the communitiesmore randomly since the correlation between the first and thesecond layer diminishes.

The randomization has the consequence that the percolationthreshold f a

c gets closer to that of the corresponding Erdos-Renyi random network. However, for a reasonably large range

0.5

0.6

0.7

0.8

0.9

1.0

10-4 10-3 10-2 10-1 100

f c

p

descendingascending

fitting

FIG. 3. (Color online) Percolation thresholds for various shufflefraction values p for the copy-and-shuffle model. The green upperand red lower lines denote the critical points f d

c and f ac , respectively.

The critical points are determined by the peak of the susceptibility.The points are calculated for 50 independent runs. The blue dashedline is calculated using Eq. (3).

052810-3

S. Boccaletti et al. / Physics Reports 544 (2014) 1–122 23

MP

UC

MN

Fig. 8. (Color online) Schematic illustration of three kinds of correlated multiplex networks, maximally-positive (MP), uncorrelated (UC), and maximally-negative (MN). Each layer of the networks has different types of links, indicated by solid and dashed links, respectively.Source: Reprinted figure with permission from Ref. [48].© 2014, by the American Physical Society.

Fig. 9. (Color online) Example of all possible multilinks in a multiplex network with M = 2 layers and N = 5 nodes. Nodes i and j are linked by onemultilink Em = (m↵,m↵0 ).Source: Reprinted figure from Ref. [113].

communication and trade layers. Even the two ‘‘negative’’ layers of enmity and attack have significant overlap of the links.As a second example of multiplex network with significant overlap, consider the APS data set of citations and collaborationnetworks [113]. The two layers in this data set display significant overlap because two co-authors are also usually citingeach other in their papers.

Oneway to characterize the link overlap is by introducing the concept ofmultilinks [49,113]. Amultilink fully determinesall the links present between any given two nodes i and j in the multiplex. Consider for example the multiplex with M = 2layers, i.e. the duplex shown in Fig. 9. Nodes 1 and 2 are connected by one link in the first layer and one link in the second.Thus, we say that the nodes are connected by a multilink (1, 1). Similarly, nodes 2 and 3 are connected by one link in thefirst layer and no link in layer 2. Therefore, they are connected by a multilink (1, 0). In general, for a multiplex of M layers

Boccaletti et al., Phys. Rep. (2014)

Page 33: Social Connectome

Context in the dynamics

Page 34: Social Connectome

decompose!

friend A

friend B

friend C

burst

contexts

burst

Contextual bursts Jo et al., PRE (2013)

contextual burst

Page 35: Social Connectome

irrelevant context

irrelevant time-frame

collective real inter-event timeP (l) ⇠ l�↵

contextual real inter-event time

P (⌧) ⇠ ⌧�↵0

contextual ordinal inter-event timeP (n) ⇠ n��

⌧ =nX

i=1

li ↵0 = min{(↵� 1)(� � 1) + 1,↵,�}

Page 36: Social Connectome

Returning to the research questions

Page 37: Social Connectome

Hypothesis:A continuum of overlapping communities

Q1: What does the social network look like?

Page 38: Social Connectome

Q2: What drives the evolution of the social network?

Hypothesis:Bursty contextual activities

Page 39: Social Connectome

Social Connectome

topological scale

dyna

mic

al re

solu

tion

individual

communitysocie

ty

aggregate

burst

seasonality

temporal motif

layered structure strength of weak ties

event

egocentric net

contextual bursts (CB)

circadian cycleoverlapping communities

(OC)

OC+CB

Page 40: Social Connectome

eventtim

eburst

communitiesindividuals

Page 41: Social Connectome

Other issues• General framework for stylized facts in social

networks [Jo, Murase, Torok, Kaski, Kertesz]

• Correlated bursts [Karsai et al., Sci. Rep. (2012); Jo et al., Phys. Rev. E (2015)]

• Dynamics on networks: spreading [Jo et al., Phys. Rev. X (2014)]

• Perception-based network formation [Jo et al., submitted]

Page 42: Social Connectome

References• Onnela et al., Proc. Nat. Acad. Sci. USA 104, 7332 (2007); New J. Phys. 9, 179 (2007)

• Granovetter, Am. J. Sociol. 78, 1360 (1973)

• Kumpula et al., Phys. Rev. Lett. 99, 228701 (2007)

• Karsai et al., Phys. Rev. E 83, 025102 (2011)

• Barabasi, Nature 435, 207 (2005)

• Malmgren et al., Proc. Nat. Acad. Sci. USA 105, 18153 (2008)

• Goh & Barabasi, EPL 81, 48002 (2008)

• Jo et al., New J. Phys. 14, 013055 (2012)

• Jo et al., PLOS ONE 6, e22687 (2011)

• Jo et al., EPJ Data Science 1, 10 (2012)

• Ahn et al., Nature 466, 761 (2010)

• Murase, Jo et al., Phys. Rev. E 90, 052810 (2014)

• Jo et al., Phys. Rev. E 87, 062131 (2013)

• Jo et al., Phys. Rev. X 4, 011041 (2014)

Page 43: Social Connectome