sadi: a novel model to study the propagation of social...

SADI: A Novel Model to Study the Propagationof Social Worms in Hierarchical Networks

Tianbo Wang, Chunhe Xia, Sheng Wen, Hui Xue, Yang Xiang , Senior Member, IEEE, and Shouzhong Tu

Abstract—As more and more people rely on social networks for business and life, social worms constitute one of the major security

threats to our society. Modern social worms exhibit two new features,message notification and the temporal characteristic of human

mobility. Message notification indicates a user will get a reminder once a new message comes to a social account. The temporal

characteristic of human mobility indicates a user can operate corresponding computer in different locations with different resting time.

Previous scholars have proposed some analytical models for the propagation dynamics of social worms. However, they did not

consider the above two features and there is one critical problem unrealized, which is structural imperfection of network topology.

Previous models have not taken into account the hierarchical topology structure, which results from a many-to-many relationship

between users and hosts. To address these problems, we model propagation dynamics of social worms oriented hierarchical networks

in this paper, and the proposed model accurately describes the propagation behavior of social worms. We conduct both a theoretical

analyses and extensive simulations to show our model can overcome inaccuracy in the number of infected nodes and provide a

stronger approximation for the worm propagation. The results show that our model presented in this paper achieves a greater accuracy

in characterizing the propagation of modern social worms.

Index Terms—Network security, worm propagation, human mobility, modeling

Ç

1 INTRODUCTION

PROTECTING a computer system from malicious attacks isa key challenge to research and management communi-

ties of network security. In recent years, social worms suchas “Koobface,” “Samy,” and “Here you are,” constitute oneof the major network security problems. According toSymantec Corporation’s [1] report on official Internet secu-rity threats, the frequency and virulence of their propaga-tion outbreaks have increased dramatically in the last fewyears. Social network is still a good propagation platformfor worm spreading. Moreover, we have observed the addi-tion of real world social engineering, where virtual and realworld attacks are being combined to increase the odds ofsuccess. Thus, social worms pose a significant threat to ourwork and life environment.

Social worms are currently widely used by attackersbecause of the following characteristics: First, they only relyon knowledge of network topology, which do not requireany vulnerable information in a computer system or soft-ware. They have less network traffic than scanning wormsand better imperceptibility. Second, by exploiting trustbetween friends, many users fail to recognize malware or

malicious codes that are sent by their friends and subse-quently users become infected. This makes worm propaga-tion more effective. Finally, the carrier of social worms ishuman mobility, which is the connection of a hierarchicalnetwork and the addition of social engineering techniques.The propagation of social worms has the characteristics ofinterdisciplinarity and multidimension.

1.1 Motivation

Scientists have done a lot of work to understand how socialworms propagate in the Internet and establish defense strat-egies [2], [3], [4], [5], [6]. Current researches focus on model-ing the propagation dynamics which is a fundamentaltechnique for developing countermeasures to reduce spreadspeed and prevalence of social worms. There are a fewworks reported to model social worm propagation. Previ-ous works [2], [3], [4], [6] analyze the impacts of messagechecking behavior and social topology structure on socialworm propagation. However, modern social worm is farmore aggressive to spread in networks than before by intro-ducing two new features. The first feature is “messagenotification,” i.e., a user will get a reminder once a newmes-sage comes to a social account. The second feature is “thetemporal characteristic of human mobility,” i.e., a user canspend different time in different locations to use corre-sponding hosts. Thus, previous works have a one-sidednessin propagation modeling, which fail to reflect social wormcharacteristics comprehensively.

There remains one important problem. Structural imper-fection of network topology leads to underestimation of theinfection scale. Because different social roles lead to humanmobility among different locations, they use different com-puters to address related work in different locations. This

� T.B. Wang, C.H. Xia, and H. Xue are with the Beijing Key Laboratory ofNetwork Technology, Beihang University, Beijing 100191, China.E-mail: {wangtb, xch}@buaa.edu.cn, [email protected].

� S. Wen and Y. Xiang are with the School of Information Technology,Deakin University, Geelong 3220, Australia.E-mail: {wsheng, yang}@deakin.edu.au.

� S.Z. Tu is with the Beijing Institute of Electronics Technology andApplication, Beijing 100091, China. E-mail: [email protected].

Manuscript received 29 Oct. 2014; revised 9 June 2015; accepted 27 Sept.2015. Date of publication 11 Jan. 2017; date of current version 16 Jan. 2019.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference the Digital Object Identifier below.Digital Object Identifier no. 10.1109/TDSC.2017.2651826

142 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 16, NO. 1, JANUARY/FEBRUARY 2019

1545-5971� 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0001-5252-0831

https://orcid.org/0000-0001-5252-0831

https://orcid.org/0000-0001-5252-0831

https://orcid.org/0000-0001-5252-0831

https://orcid.org/0000-0001-5252-0831

mailto:

mailto:

mailto:

mailto:

results in a many-to-many relationship between users andhosts. However, in the topology of previous models [2], [3],[4], nodes denote different users, which implies a one-to-one relationship between a user and a host. Therefore, pre-vious models cannot reflect the real topology structure ofsocial worms. These observations become the motivation ofour work to present a Susceptible-Active-Dormant-Immune(SADI) model, which can accurately present the propaga-tion of social worms.

1.2 Contributions

The contributions of this research are summarized asfollows:

� We carry out extensive analysis on two criticalfactors of social worms, message notification and thetemporal characteristic of human mobility, whichaffect the accuracy of current analytical models.Moreover, we prove that there is a deviationbetween the original conclusions and the realworld, which is not suitable for interdisciplinarityand multi-dimension scenario.

� We propose a novel SADI model, which helps us tobetter understand social worm propagation. Ourmodel can overcome underestimation of the infectionscale, and reflect the realistic propagation dynamics.

� We conduct a number of experiments to evaluatethe presented model. The results show that theSADI model is more accurate than the state-of-the-art models.

The rest of this paper is organized as follows. Section 2states problems in modeling the propagation dynamics. InSection 3, we explain details of the SADI model. We thenimplement a number of experiments and theoretical analy-ses to evaluate the accuracy of our model in Sections 4 and 5respectively. Further discussion and related work are pro-vided in Sections 6 and 7 respectively. Section 8 concludesthis paper.

2 PROBLEM STATEMENT

The topologies of social worms consist of a social logicallayer and an actual physical layer, as shown in Fig. 1. Theformer has following characteristics: 1) they are defined as a“semi-directed network,” in which some edges are directedand others are undirected [8], [9]; 2) The indegree of nodestends to match the outdegree, and they both follow thepower law distribution [8], [9], [10]; 3) they are assortative,which implies that nodes with a high degree tend to connectwith each other [9]; 4) the weight of each edge denotes thepropagation probability from user i to user j [2]; 5) eachnode in the social logical layer contains a group of nodescorresponding to nodes in the actual physical layer (i.e.,hosts of different locations) [7]. The latter has some charac-teristics of power law, disassortativeness, rich-club andlocalization [11], [12]. For convenience of description,malicious emails, links or profiles are called “messages.”

2.1 Problem from Technical Perspective

Two New Features of Social Worm Propagation. First, messagenotification, as the name suggests, indicates a user will get areminder once a new message comes to a social account. Itsadvantage is reminding the user to check new messagesin time. We illustrate the message notification in Fig. 2. Sup-pose two users i and j get infected and send out malwaremessage copies to their neighbor k. In case 1 of no messagenotification, although user k receives two malicious mes-sages at time t8 and t10, user k will not check the mailbox orlog in the social network before time t12, because these twotimes are not the period of time for message checking. Nev-ertheless, in case 2 of message notification, user k will readtwo malicious messages from users i and j with probabilityqn at t9 and t11. Furthermore, it is time for user k to checkthe mailbox or log in the social network at time t12. Thus,user k gets infected much faster.

Second, the temporal characteristic of human mobility,through human mobility in different locations, indicatespeople can use different computers to enjoy social services,such as Facebook, Twitter and Gmail. The resting time

Fig. 1. The propagation scenario of social worms in hierarchical net-works. The black nodes in actual physical layer denote the user visitedpreviously S ¼ 3 locations. At the next time tþ Dt (i.e., Ti

rest denotes thetime user i spent at one location), the user i can either (i) Preferentialreturn, Pcom ¼ 1� Pnew, the user i visited previously locations with fre-quency fi that is proportional to the size of circles drawn at eachlocation, or (ii) Exploration, Pnew ¼ rS�g , user i moves to a new loca-tion and operates a host C in location c, let S ¼ S þ 1, where S is thetotal number of visited locations, parameters r and g are determinedby the empirical data [7].

Fig. 2. The analysis of message notification mechanism. User i and userj read one malicious message at t7 and t9 respectively, and then sendone malicious copy to neighbor k. Case 1: no message notification; Case2: message notification.

WANG ETAL.: SADI: A NOVEL MODELTO STUDY THE PROPAGATION OF SOCIALWORMS IN HIERARCHICAL NETWORKS 143

when a user visits different locations is not the same, so itwill influence the operation condition of computers. Thus, itnot only impacts on the infection scale, the spread thresholdand speed, but also promotes the formation of hierarchicalnetworks. Human mobility model is as shown in Fig. 1.

The Problem of Structural Imperfection of Network Topology.The fact that human have different social roles leads tohuman mobility among different locations, and it deter-mines that the same user accesses and operates correspond-ing host in different locations. This phenomenon leads totraditional topology oriented social worms which are incon-sistent with actual topology oriented social worms. Previousmodels [2], [4], [11] are based on social network topologieswhich are abstracted with contact lists, and each nodedenotes a user. This characteristic assumes every user onlyowns one host by default. However, in the real world differ-ent hosts in different locations belong to the same individ-ual. Thus, previous models suffer from the problem ofstructural imperfection of network topology. To tackle thisproblem, we introduce the hierarchical network topology,as shown in Fig. 1.

2.2 Problem from Theoretical Perspective

1) According to the conclusions in [11], [13], in homoge-neous and power-law heterogeneous networks, there aretwo important conclusions as in Thomogeneous ¼ 1

<k> andTheterogeneous ¼ <k>

<k2 >, where < k > is the average connectiv-

ity and <k2> signals the connectivity divergence. Giventhat in a network the degree of node i is ki and the size ofthe network is S, we have <k>¼ 1

S

PSi¼1 ki. Similarly,

<k2>¼ 1S

PSi¼1 k

2i . However, traditional social network

topology of social worms is altered by hierarchical networktopology. It implies hierarchical network topology consistsof the social logical layer and the actual physical layer in thereal world. Given that the upper-bound of the host ownedby the individual is Cu, where N is the size of the traditionalsocial network. Thus we have

<k0> ¼ 1CuN

�PCuN

j¼1kj

� �¼ CuCuN

�PN

i¼1Cuki

� �¼Cu <k> (1)

<k02> ¼ 1CuN

�PCuN

j¼1k2j

� �¼ CuCuN

�PN

i¼1ðCukiÞ2

� �¼C2

u <k2 > (2)

where <k0> and <k02> denote the average connectivityand the connectivity divergence in the actual physical layerof hierarchical networks, respectively. We take Equations (1)

and (2) into the above conclusions. Therefore, we obtain

T 0homogeneous ¼ 1

Cu <k> and T 0heterogeneous ¼ <k>

Cu <k2 >. We see that

it makes the propagation threshold smaller and changes the

propagation properties of worms.2) For topology worms, the papers of [11] and [14]

proved the following conclusion: given that the probabilityp an infected node tries to infect its neighbors, and the prob-ability d a node is cured. AM denotes the adjacency matrixof the network topology and �1;AM denotes its maximumeigenvalue. When an epidemic diesa off, we shouldhavep

d< 1

�1;AM. In this conclusion, the probability p is the

average infection probability. AM only indicates the adja-cency relationship between nodes. However, the above con-clusion is not accurate for modern social worm. Thus, wewill prove the spread threshold as follows:

We introduce an N by N square matrix PT to describe thepropagation topology of actual physical layer.

p11 p12 � � � p1N

..

. ...

pij...

pN1 pN2 � � � pNN

0B@

1CApij 2 ½0; 1�; (3)

where denotes pij represents the propagation probabilityfrom node i to node j. pij ¼ 0 means there is no connectionbetween node j and node i.N ¼ CuM,M is the total numberof users in social logical layer. Given PiðtÞ is the probabilitythat node i is infected at time t. ziðtÞ is the probability thatnode idoes not receive infections from its neighbors at time t.

ziðtÞ ¼Y

j:neighborofi

Pjðt� 1Þð1� pjiÞ þ ð1� Pjðt� 1ÞÞ� �¼

Yj:neighborofi

1� pji � Pjðt� 1Þ� �:

(4)

According to ð1� aÞð1� bÞ � 1� a� b, a; b � 1, we definethe healthy probability of a node i at time t as follows:

PiðtÞ ¼ ð1� Piðt� 1ÞÞziðtÞ þ dPiðt� 1ÞziðtÞþ 1

2dPiðt� 1Þð1� ziðtÞÞ

¼ 1

2dPiðt� 1Þ þ 1þ 1

2d� 1

� Piðt� 1Þ

�X

j:neighborofi

pji � Pjðt� 1Þ

(5)

PiðtÞ ¼ ð1� dÞPiðt� 1Þ þX

j:neighborofi

pji � Pjðt� 1Þ: (6)

Converting Equation (6) to matrix notation PðtÞ ¼ ðP1ðtÞ;P2ðtÞ; . . . ; PNðtÞÞ

PðtÞ ¼ ð1� dÞEþ PTð ÞPðt� 1Þ ¼ SM � Pðt� 1Þ¼ SMtPð0Þ; (7)

where SM ¼ ð1� dÞEþ PT, and E is the unit matrix.According to linear algebra, the matrices SM and PT havethe same eigenvectors Ui;SM, and their eigenvalues �i;SM

and �i;PT are closely related:

�i;SM ¼ 1� dþ �i;PT: (8)

Using the spectral decomposition,

SM ¼Xi

�i;SMUi;SMðUTi;SMÞ (9)

SMt ¼Xi

�ti;SMUi;SMðUT

i;SMÞ: (10)

Thus

PðtÞ ¼Xi

�ti;SMUi;SMðUT

i;SMÞPð0Þ: (11)

Given that �1;PT � �2;PT � � � � � �i;PT, where i is the num-ber of eigenvalues. For an infection to die off and notbecome an epidemic, the vector PðtÞ should be zero forlarge t, which happens when 8i, �t

i;SM tends to 0. Thisimplies �1;SM < 1. So,


1� dþ �1;PT < 1; (12)

which means �1;PT < d.Thus, we obtain a new conclusion: �1;PT < d, and it is

more accurate than the previous one.

3 SADI PROPAGATION MODEL

In this section, we incorporate two new propagation fea-tures, and model the propagation dynamics of social wormsin the hierarchical network.

3.1 Introduction of Related Terms

We formalize the states of nodes and the topology informa-tion. Let random variable Xt

i;n denote the state of a host nused by user i at time t.

Xti;n ¼

Hea:;Hea:; healthySus:;Sus:; susceptibleImm:;Imm:; immune

Inf:;Inf:; infected

Act:;Act:; activeDor:;Dor:; dormant:

8>><>>: (13)

We derive the state transition graph of an arbitrary nodebased on Equation (13), as shown in Fig. 3.

By depicting the states of nodes and the state transitionprobability, we express the infection status of hosts, andprovide a modelling mechanism about social worm pro-pagation. In Fig. 3, each node and edge denotes a state anda state transition probability between nodes, respectively.For the convenience of readers, we list all major notations ofthis paper in Table 1.

We introduce anM byM squarematrix to describe propa-gation topology in the social logical layer, as in Equation (3),and propose anM by St

i matrix to denote the number of loca-tions visited by users at time t, as follows:

Nt11 � � � Nt

1Sti

..

.Nt

ij...

NtM1 � � � Nt

MSti

0BBB@

1CCCA; (14)

where Ntij represents the number of user i visiting location j

by time t, and the time user i spent at location j is chosen from

the P ðT irestÞ distribution. Pi)jðtÞ ¼

NtijPSt

ij¼1

Ntij

decides which

location user i returns with a probability Pcom. In addition, we

propose employing an M by M square matrix to describefriend relationships between users, and find all neigh-

bors of user i in the shortest path with k-hop distance.

A11 � � � A1M

..

.Aij

..

.

AM1 � � � AMM

0B@

1CAAij 2 ½0; 1�; (15)

where Aij ¼ 1 means there is a friend ralationship betweenuser i and user j. Otherwise, Aij ¼ 0.

On the one hand, we define a random variable openiðtÞ.We have openiðtÞ ¼ 1 if the user i is checking messages orthere is a notification at time t. Otherwise, openiðtÞ ¼ 0.

P ðopeniðtÞ ¼ 1Þ ¼0; otherwise1; t mod T i

check ¼ 0qn; t ¼ t0

8<: ; (16)

where t ¼ t0 denotes once a message comes to a socialaccount, the social application will pop up a notification foruser i at time t. Then let random variable stayni ðtÞ denotesthe current location nwhere user i stay at time t.

P ðstayni ðtÞ ¼ 1Þ ¼ 0; otherwise1; user i stay in the locationn at time t:

(17)

Fig. 3. The state transition graph of a node in the SADI model.

TABLE 1Major Notations Used in This Paper

Symbol Explanation

Pi)jðtÞ The probability of user i staying in the location j attime t.

Ticheck Message checking time of user i.

Tirest The time user i spent at a certain location.

rðtÞ The recovery function of hosts, which provides theprobabilityfor any host to be immunized at time t.

Xti;n The state of a host n used by user i at time t.

pij The propagation probability from node i to node jopeniðtÞ The event of user i checking newly arrived messages at

time t.t0 The arbitrary time between user i last checking mes-

sages andthe current time t (excluding t), which records the timewhen a new message comes to a social account.

M The size of the social logic layer.nðtÞ The infection scale in the hierarchical network at time t.mðtÞ The infection scale in the social logical layer at time t.iðtÞ The infection scale in the hierarchical network at time t.vði; n; tÞ The infection probability of a susceptible host n used

by user iat time t. If n ¼ 1, vði; n; tÞ ¼ vaði; 1; tÞ.Otherwise, vði; n; tÞ ¼ vbði; n; tÞ

Ni The set of neighboring users of user i.Sti The total number of locations visited by user i at time t.

f The sharing rate of a computer used by different users.qn The probability of user i checking message notification

at time t.qs The probability of user i visiting a stranger’s message at

time t.qf The probability of user i visiting a friend’s message at

time t.degi The degree of node i.miðtÞ The number of neighbors of user i that are infected in

thesocial logical layer at time t.


On the other hand, users check new messages periodi-cally. Thus, it is necessary to introduce variable t0 to obtainthe number of unread messages at current time t.

As shown in Fig. 4, we have

t� Ticheck t0 < t; if openiðtÞ ¼ 1

t� ðt mod T icheckÞ t0 < t; otherwise:

(18)

3.2 Modeling Propagation Dynamics

1) Calculating nðtÞWe use the values 0 and 1 to substitute the healthy state

and the infected state, respectively. Given a topology of thehierarchical network with M St

i nodes, the expected num-ber of infected nodes at time t, nðtÞ can be calculated as in

nðtÞ¼E

�PM

i¼1

PSti

n¼1Xti;n

¼Inf:

�¼PM

i¼1

PSti

n¼1E�Xti;n

¼Inf:�

¼PM

i¼1

PSti

n¼1P�Xti;n

¼1�¼PM

i¼1

PSti

n¼1P�Xti;n

¼Inf:�

¼PM

i¼1P�Xt

i;1 ¼ Inf:�|fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl}

private

þð1�fÞ�XStin¼2

P Xti;n ¼ Inf:

� �|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}

public

26664

37775:

(19)

In order to address the problem of the many-to-many rela-tionship between users and hosts, we adopt the “divide-and-composite” idea. First, we calculate the probability that publichosts of user i are used by other people. Second, we intro-duce f to discuss the situation that public hosts of differentusers may belong to the same host. From Equation (19), wecan see that hosts used by user i are divided into two parts: aprivate host and public hosts. Public hosts indicate that hostsused by user i are able to be used by other people, and viceversa. And then we introduce the factor ð1� fÞ to calculateactual infected hosts by the sharing rate.

According to Fig. 3, we derive the computation ofP ðXt

i;n ¼ Inf:Þ by different equations as follows:

P ðXti;1 ¼ Inf:Þ ¼ vaði; 1; tÞ � P ðXt�1

i;1 ¼ Sus:Þþ ð1� rðtÞÞ � P ðXt�1

i;1 ¼ Inf:Þ (20)

P ðXti;n ¼ Inf:Þ ¼ vbði; n; tÞ � P ðXt�1

i;n ¼ Sus:Þþ ð1� rðtÞÞ � P ðXt�1

i;n ¼ Inf:Þ; n > 1(21)

P ðXti;n ¼ Sus:Þ ¼ 1� P ðXt

i;n ¼ Inf:Þ � P ðXti;n ¼ Imm:Þ (22)

P ðXti;n ¼ Imm:Þ

¼ rðtÞ � ð1� P ðXt�1i;n ¼ Imm:ÞÞ þ P ðXt�1

i;n ¼ Imm:Þ (23)

Once we obtain the values of rðtÞ, vaði; 1; tÞ and vbði; n; tÞ,the value of P ðXt

i;n ¼ Inf:Þ can be calculated by the iterationof the above Equations (20), (21), (22), and (23).

2) Calculating vaði; 1; tÞ and vbði; n; tÞvaði; 1; tÞ and vbði; n; tÞ represent the infected probability

that user i operates the private or the public host in locationn at time t, respectively. There are four preconditions: 1) thehost is not recovered and immunized; 2) the user is check-ing the social account for new messages; 3) the susceptibleuser reads those malicious friends’ messages with a proba-bility qf , or those malicious strangers’ messages with a prob-ability qs; 4) the user is in location n.

vaði; 1; tÞ ¼ ½qf � sði; 1; tÞ þ qs �mðt� 1Þ �miðt� 1ÞM � degi

� � ð1� rðtÞÞ

� P ðopeniðtÞ ¼ 1Þ � P ðstay1i ðtÞ ¼ 1Þ(24)

vbði; n; tÞ ¼X5k¼0

Xj:neighborofi;

dijmin

¼k

qf � sðj; n; tÞ þ qs �mðt� 1Þ �mjðt� 1ÞM � degj

�

� ð1� rðtÞÞ � P ðopenjðtÞ ¼ 1Þ � P ðstaynj!iðtÞ ¼ 1Þ � Pj!iðkÞ(25)

P ðstaynj!iðtÞ ¼ 1Þ ¼P ðstayni ðtÞ ¼ 1Þ; k ¼ 0

� � StjPSt

j

n¼1TjrestðnÞ

� 1k ; k 6¼ 0

8<: ; (26)

where P ðstaynj!iðtÞ ¼ 1Þ denotes the probability that user jand user i stay in the same location n, which is propor-tional to St

j, and is inversely proportional to the total timeof user j staying in different locations n (Tj

restðnÞ) andthe k-hop distance between user j and user i. � ¼ 1=½P5

k¼1Pj:neighborofi;

dijmin

¼k

StjPSt

j

n¼1TjrestðnÞ

� 1k�. dijmin ¼ k denotes user j is

the neighbor of user i in the shortest path with k-hop dis-

tance. Pj!iðkÞ denotes the probability that user j in the

k-hop distance path operates the host of user i in location

n at time t. As shown in Fig. 5, the value of Pj!iðkÞ is theproduct of all weights in the shortest path (j; k� 1; . . . ;

1; i). Setting upper bound of k-hop distance in Equation (25)

has been discussed in Section 5.1.

Fig. 4. Two cases of variable t’: (a) User checks messages at currenttime t; (b) User does not check messages at current time t.

Fig. 5. Illustration of computing Pj!iðkÞ. The weight of edge (i, j) is the

probability 1degj

that user j chooses a host of neighbor i multiplied by the

probability 1degi

that a host of user i is able to be used by neighbor j.


Pj!iðkÞ ¼1

degj� 1deg2

k�1

� � 1deg2

1

� 1degi

; j 6¼ i

1; j ¼ i:

((27)

Given Gtj;mji;n is the probability that at time t host m oper-

ated by neighbor j is at the active state under the conditionthat at time ðt� 1Þ host n operated by user i is at the suscep-tible state.

Gtj;mji;n ¼ P ðXt

j;m ¼ Act: jXt�1i;n ¼ Sus:Þ: (28)

The probability of user i reading messages with a suscepti-ble host n from an infected host m used by user j at time t ispji � Gt0

j;mji;n. We can compute sði; n; tÞ as in

sði; n; tÞ ¼ 1�Yj2Ni

YStjm¼1

�1� pji � Gt0

j;mji;n�: (29)

We disassemble Equation (29) by excluding t� 1 fromthe range of value t0. There are two cases. In Fig. 6a, if theuser does not check new messages at time t� 1, we have

Yj2Ni

YStjm¼1

h1� pji � Gt0

j;mji;ni

¼Y

j2Niðt�1Þ =2 t0

YStjm¼1

h1� pji � Gt0

j;mji;ni�

Yj2Ni

YStjm¼1

h1� pji � Gt�1

j;mji;ni

¼ ð1� sði; n; t� 1ÞÞ �Yj2Ni

YStjm¼1

h1� pji � Gt�1

j;mji;ni:

(30)

In Fig. 6b, if the user checks new messages at time t� 1, thenew messages received at time t are sent at time t� 1 by theinfected neighbors. Thus, we haveY

j2Ni

YStjm¼1

½1� pji � Gt0j;mji;n�

¼Yj2Ni

YStjm¼1

½1� pji � Gt�1j;mji;n�:

(31)

We can unify the computation of sði; n; tÞ by Equations (30)and (31) as follows:

sði; n; tÞ ¼ 1� ð1� sði; n; t� 1Þ½1� P ðopeniðt� 1Þ ¼ 1Þ�Þ�Yj2Ni

YStjm¼1

½1� pji � Gt�1j;mji;n�:

(32)

3) Eliminating the Propagation Error [4]In order to eliminate the error caused by the spreading

cycles of Gt�1j;mji;n in Equation (32), we provide an example as

shown in Fig. 7.

We let variables Ci;n!j;m and Cij represent an arbitrarypropagation path on actual physical layer and social logicallayer, respectively. dðCi;n!j;mÞ denotes the dependence prob-ability of the propagation path. We then introduce tðt; CijÞas the beginning time of the propagation path in the sociallogical layer. Thus, the probability for node (j;m) infectedby node (i; n) through pathCi;n!j;m can be computed as in

CðtÞ ¼ dðCi;n!j;mÞ � P ðXtðt;CijÞi;n ¼ Act:Þ: (33)

There are a total of NUMðkÞ; ð1 k LÞ k-hop propaga-tion path from node (i; n) to node (j;m), and L is the maxi-mum length of the propagation path. We can eliminate thespatial dependence as follows:

Gt�1j;mji;n

¼ 1� 1� vðj;m; t� 1ÞQLk¼1

QNUMðkÞm¼1 ð1�Cmðt� 1ÞÞ

" #� P�

Xt�2j;m ¼ Sus:

�

¼ 1� 1� vðj;m; t� 1ÞQLk¼1

QNUMðkÞm¼1 ½1� dðCi;n!j;mÞP ðXtðt;CijÞ

i;n ¼ Act:�

24

35�

P�Xt�2

j;m ¼ Sus:�:

(34)

How to compute the values of tðt; CijÞ and dðCi;n!j;mÞ?According to the computations of the earliest time (te) andlatest time (tl) of nodes j and a in Fig. 8, te and tl of otherusers is computed by parity of reasoning. We label thenodes in an arbitrary propagation path Cij as the number0 to k from user i to j. We calculate the earliest ðteÞ and thelatest ðtlÞ time of each node as follows:

tek ¼ t� Tkcheck � 1

tlk ¼ t� 2

(35)

tek�x ¼ tek�xþ1 � ðtek�xþ1 modTk�xcheckÞ

tlk�x ¼ tlk�xþ1 � ðtlk�xþ1 modTk�xcheckÞ � 1

(; 0 < x k; (36)

Fig. 6. Two cases for the computation of sði; n; tÞ. User in (b) checks newemails at time t� 1, but user in (a) does not.

Fig. 7. An example of the propagation cycle.


when x is equal to k, it means tðt; CijÞ falls in the time range½te0; tl0� and finally receives them. In the social logical layer,the time for each node infected can be computed by tðt; CijÞas follows:

th1 ¼ tðt; CijÞ � ½tðt; CijÞmod T 1check� þ T 1

check

thx ¼ thx�1 � ðthx�1 modTx�1checkÞ þ Tx�1

check; 1 < x k:

((37)

V denotes the set of nodes containing node i and node jin propagation path Cij. Nodeseti denotes the set of hostsused by user i. We can calculate the probabilistic effectdðCi;n!j;mÞ as follows:

dðCi;n!j;mÞ ¼Yk

z2V;z¼1w2Nodesetz

pz�1;z � P ðXthz�1z;w ¼ Sus:Þ�

P ðcleanthzz;w ¼ 0Þ � P ðstaywz ðtÞ ¼ 1Þ;

(38)

where P ðcleanthzz;w ¼ 0Þ if the host w used by user z is not

recovered at time thz. Otherwise, P ðcleanthzz;w ¼ 1Þ.

4 MODEL VALIDATION

In this field, all existing works adopt simulations to evaluateanalytical models, such as [3], [4], and [14]. In order to vali-date the correctness of the SADI model, we draw an SADIcompatible propagation simulator from existing simulationmodels [2], [4], [15], [16]. The implementation is in C++ andMatlab R2014a. The random numbers in experiments areproduced by the C++ TR1 library extensions. The topologiesof the hierarchical network adopted for evaluation includethe social logical layer (10,000 nodes) and the actual phy-sical layer (30,000+ nodes). The experimental results are

averaged by 100 runs, and the running number comes fromthe discussion of “how many simulation runs are neededbefore we obtain a steady curve [2].” Each run of the spreadbegins with two infected nodes, which are randomly chosenfrom the network. We set the two nodes with a distance ofsix (the number of edges between them) in the topology,which reflects the impact of the cluster-coefficient [4].

4.1 Real-World Samples of Network

To examine the rationality of the SADI model, we use thedata of Nyxem Email worm outbreak collected by CADIA[17]. The Nyxem virus is a 95 kb Visual Basic executable thatinfects a computer when an unwary user runs an executableemail attachment. After infecting a computer, it attemptsto disable a variety of antivirus products and then looks foremail addresses to automatically spread itself using a varietyof Subject fields and attachment names. To generate our esti-mate of the total number of infections, we examine two val-ues for each IP address: the number of unique, vulnerablebrowser types and the total number of probes received fromthat IP address. The former represents a lower bound on thenumber of infections, while the latter represents an upperbound. We estimate the total victim count to be between469,507 and 946,835 in more than 200 countries betweenJanuary 15 23:40:54 UTC 2006 and Wednesday February 105:00:12 UTC, the length of time is 365 hours. This range rep-resents between 3.2 and 6.4 percent of all log entries research-ers examined. However, the researchers do not give aconcrete structure or the scale of social network. Thus, accord-ing to the above statements, we set the size of social networkis 500,000, and generate the topology by the simulator.

Fig. 9 shows that the infection scale of the SADI model isbasically equal to the average number of hosts infected byNyxem Email worm, the time interval is 1 hour. It indicatesthat our presented model can characterize the propagationdynamics of realistic worms accurately, and reflect thepropagation characteristics veritably. If we can obtain realvalues of the topology parameters related to observedregions, the SADI model can describe the social worm prop-agation more accurately.

4.2 Comparison with Previous Models

To evaluate the accuracy of our model, we conduct experi-ments with different parameter settings. First, in order toexclude the impact of different recovery processes, theexperiments are carried out without the recovery process

Fig. 8. Illustration of computing te and tl.

Fig. 9. The comparison with the real-world worm samples. Checking time: Ticheck � Expð1=40Þ; Resting time: P ðTi

restÞ / ðTirestÞ�1:8; qs ¼ 0:1, qn ¼ 0:9,

qf ¼ 1, f ¼ 0:5; Topology: a ¼ 2:5, <k>¼ 5:6, � ¼ 0:23, pij � Nð0:5; 0:22Þ.


ðrðtÞ ¼ 0Þ in this section. The topology has a power-lawexponent a ¼ 2:5, an average degree <k>¼ 5:6 and a reci-procity rate � ¼ 0:23. The infection probability pij followsGaussian distribution Nð0:5; 0:22Þ. We let T i

check and T irest

follow Exponential distribution Expð1=40Þ and Power-lawdistribution P ðT i

restÞ / ðT irestÞ�1:8, respectively. These param-

eter settings come from previousworks of [2], [4], and [7].As shown in Fig. 10, other models except the SADI model

result in underestimating the infection scale. The mainreason is that their propagations are modeled in the sociallogical layer, and previous researchers do not consider theimpact of hierarchical networks. We exhibit the differencesD in the inset of Fig. 10, and can see the results of previousmodels deviate from the simulation by 10 thousands lessinfections at maximum. There is also a minor divergencebetween the SADI model and the simulation. The SADImodel is far more accurate than other models.

Moreover, because the epidemic model [12] and thespatial-temporalmodel [3] do not implementmessage check-ing time, their spread is actually determined by hops. WhenT icheck ¼ 40, the spreading of one hop will cost 40 time inter-

vals. As a result, their curves show stairway shapes, whichlargely deviate from the simulation results. Finally, the infec-tion spreads at a faster rate in both independent andMarkovmodels than the simulation, because they assume spatialindependence and conditional independence respectively.Propagation cycles in the propagation path are also ignoredby them. In addition, the spread speed of the XSS model isfaster than the SII model, and the reason is that users can visitstrangers’ profile with a certain probability.

4.3 Evaluate the Performance of the SADI Model

We also evaluate the impact of various parameters on theperformance of the modeling.

4.3.1 Impact of Different Parameters and Distributions

of Resting Time T irest in the Modeling

Impact of Parameter xmin and Power-Law Exponent b. On alarger scale, such as inter-state and inter-urban, T i

rest followspower-law distribution (i.e., P ðT i

restÞ / ðT irestÞ�b) [7], [18].

Random samples can be generated using inverse transformsampling. Given a randomvariateU drawn from the uniformdistribution on the unit interval (0, 1], the variate T i

rest givenby T i

rest ¼ xmin

U1=ðb�1Þ [19]. We evaluate the accuracy with differ-ent values of xmin and b. xmin is 5, 15 and 45 respectively. b is1.9, 1.8 and 1.6 respectively, as shown in Figs. 11a and 11b.

The final infection scale becomes larger with the decreaseof xmin and the increase of b. Parameter variation impliesthe resting time is mainly distributed in a small value areaof the time axis, and users have more frequent activities indifferent locations. Thus, the probability that a host is usedin a certain location will increase, which makes the probabil-ity that hosts are infected larger.

Impact of Parameter u of Exponential Distribution. On asmaller scale, such as inner-urban, T i

rest follows Exponentialdistribution ExpðuÞ [20], [21], [22]. u is 1/20, 1/40 and 1/80respectively, as shown in Fig. 11c.

The final scale of infected nodes becomes larger with theincrease of parameter u. The growth trend of parameter uimplies the mean of exponential distribution decreases,which means users spend less time in the same location.Thus, the number of times that users operate hosts in differ-ent locations will increase, and it increases the probabilitythat hosts are infected.

Impact of Different Distributions of Resting Time T irest. For

power-law distribution, power-law exponent b ¼ 1:8,xmin ¼ 45. For exponential distribution, parameter u ¼ 1=80.

As shown in Fig. 11d, it indicates that the distribution ofresting time is more similar to the distribution of checkingtime, and that the social worm infects more nodes. The mainreason is that the similarity of the distribution of resting timeand checking time is low, and two kinds of human mobilityand message checking behaviors keep synchronized verydifficult. Thus, it affects the spread effect of the social worm.

4.3.2 Impact of Checking Time T icheck and Message

Notification in the Modeling

According to the above statements, we mainly analyze howdifferent distributions and parameter variation of resting

Fig. 10. The comparison of different models.

Fig. 11. Impacts of different parameters and distribution of resting time. Checking time: Ticheck � Expð1=40Þ; Topology: a ¼ 2:5, <k>¼ 5:6, � ¼ 0:23,

pij � Nð0:5; 0:22Þ.


time influence the spread. In this section, we analyze theimpact of parameter s of checking time and the probabilityqn of checking message notification. T i

check follows Exponen-tial distribution ExpðsÞ, s is 1/20, 1/40 and 1/80. qn=0.9, 0.5and 0, respectively.

As shown in Figs. 12a and 12b, the larger s and qnindicate that users check their email-boxes or log in to theirsites more frequently in social network. They result in socialworms infecting their hosts with higher probability andimproves the spread speed of social worms. However, theincrease of two parameters make the infection scale smaller.As shown in Fig. 12c, in the interval [t0, t3), suppose thatuser i operates a host in location A, and receives two mali-cious emails. Because the checking period is much shorterthan the resting time, and the probability of checking mes-sage is large once a new message comes to a social account,they result in two malicious emails that only infect one hostbefore user i leaves location A. Thus, the shorter checkingtime narrows the infection scale of worms.

4.3.3 Impact of Parameter qs in the Modeling

Assuming that all users in network have the same visiting-strangers’ probability qs, qs=0.1, 0.5 and 0.9, respectively.Fig. 13 shows that as people visit strangers’ profiles, thesocial worm spread faster and infect more hosts. The reasonis that the worm can reach out to other parts of the networkand do not circulate for a long time within a group offriends (a community).

4.3.4 Impact of Sharing Rate f in the Modeling

The sharing rate f denotes the degree of public hosts usedby people. As shown in Fig. 14, we can see that if a publichost is used by more people intensively in a certain location,the social worm propagation is slower. It indicates that pub-lic hosts distribute more dispersively, it is more helpful forthe spread of social worms.

4.3.5 Impact of Different Topologies in the Modeling

We investigate the accuracy of the SADI model in differenttopologies. Compared with the topology settings used inFig. 15 ða ¼ 2:5; � ¼ 0:23; <k>¼ 4:5; pij � Nð0:5; 0:22ÞÞ, wehave Case 1: (what if users have more friends in the socialnetwork, which means a becomes smaller and <k>becomes larger): a ¼ 2, � ¼ 0:23, <k>¼ 6, pij � Nð0:5;0:22Þ. Case 2: (what if the network is more vulnerable, whichmeans pij becomes larger): a ¼ 2:5, � ¼ 0:23, <k>¼ 4:5,pij � Nð0:8; 0:12Þ.

As shown in Fig. 15, the above two cases promote thespread of worms, and our model fits the simulation resultsvery well. This indicates the structure and property of thetopology directly affect the infection effect.

4.4 Impact of Recovery Processes in the Modeling

We evaluate the accuracy with defensive strategy. The net-work defense is described as different recovery functionsrðtÞ as follows: 1) ‘Constant recovery rate’ [3], [14]: rðtÞ ¼ r;

Fig. 12. The accuracy with different values of s and qn. Resting time: P ðTirestÞ / ðTi

restÞ�1:8; Topology: a ¼ 2:5, <k>¼ 5:6, � ¼ 0:23, pij � Nð0:5; 0:22Þ.

Fig. 13. The accuracy with different values of qs. Checking time: Ticheck �

Expð1=40Þ; Resting time: P ðT irestÞ / ðTi

restÞ�1:8; Topology: a ¼ 2:5,<k>¼ 5:6, � ¼ 0:23, pij � Nð0:5; 0:22Þ.

Fig. 14. The accuracy with different values of f. Checking time: Ticheck �

Expð1=40Þ; Resting time: P ðTirestÞ / ðTi

restÞ�1:8; Topology: a ¼ 2:5,<k>¼ 5:6, � ¼ 0:23, pij � Nð0:5; 0:22Þ.


2) ‘Ratio’: rðtÞ ¼ nðtÞV , where V is the size of the hierarchical

network. Suppose that rðtÞ is proportional to the number ofinfected nodes nðtÞ [12], [23]; 3) ‘Composite’: rðtÞ ¼ r0½1þ iðtÞ

V

n�, where r0 is the initial recovery rate, exponent n isused to adjust the recovery rate sensitivity to the number ofimmune hosts iðtÞ [24]; 4) ‘Qualys’: comes from the statisticsof Qualys Inc. [25].

rðtÞ ¼ 0; t < d

1� 0:5t�dD ; t � d;

(39)

where d is the temporal span from the malware starting tospread to scientists finding this malware on the Internet. Dis half-life period, which denotes the time for a 50 percentdecrease [4].

As shown in Fig. 16, even if the differences among therecovery functions are very big, the SADImodel fits the simu-lation results very well. Our presented model characterizesthe process of hosts recovered by users in real-world scenario.

5 THEORETICAL ANALYSIS

In this section, we study the parameter setting and thesuperiority of the SADI model theoretically.

5.1 Setting Upper Bound of k-Hop Distance

Because the setting of k value directly affects computationefficiency of vbði; n; tÞ, we need to set upper bound of k-hopdistance. As shown in Fig. 17a, we only need to searchneighboring nodes within 5-hop distance and traverse theentire social network basically. In order to calculate the

impact of Pj!iðkÞ, we assume the probability Eðj; n; tÞ ¼½qf � sðj; n; tÞ þ qs � mðt�1Þ�mjðt�1Þ

M�degj� � ð1� rðtÞÞ � P ðopenjðtÞ ¼ 1Þ�

P ðstaynj!iðtÞ ¼ 1Þ is 1. As shown in Fig. 17b, when kincreases, the effect of k-hop distance increases and trendsto constant value gradually. However, the effect of paths isnoticeable from 1-hop to 5-hop and cannot be neglected.

We assume that vbði; n; tÞ ¼ �ðkÞ and �ðkÞ denotes theapproximation of �ð1Þ with considering neighbors from 1-hop to k-hop. We have

DðkÞ ¼ �ðkÞ ��ðk� 1Þ ¼X

j:neighborofi;dijmin

¼k

Eðj; n; tÞ � Pj!iðkÞ:

(40)

Because Eðj; n; tÞ > 0 and Pj!iðkÞ > 0, we have DðkÞ > 0.Thus

�ðkÞ > �ðk� 1Þ > � � � > �ð0Þ: (41)

It indicates that SADI model will become more accurateif we compute more hops distance. When k ! 1, we havePj!iðkÞ ! 0. Thus, we obtain

limk!1

DðkÞk

¼ 0: (42)

We can see that DðkÞ is high order infinitesimal of k. Theincremental values decrease rapidly with the increase of kvalue. Thus, we do not need to calculate Pj!iðkÞ of all k-hopdistances. Actually, we can see that paths from 1-hop to 5-hop have significant effect from Fig. 17b. Therefore, wemainly focus on k-hop (k 5) distance path and it reduces alot of calculation time.

Fig. 15. The accuracy of the SADI model in different topologies.

Fig. 16. The accuracy of the SADI model affected by different recoveryfunctions.

Fig. 17. The cumulative number of k-hop distance for traversing socialnetwork layer and the cumulative probabilistic effect of k-hop neighbors.


5.2 Superiority in Modeling Propagation Dynamics

The empirical study has shown our SADI model is superiorto previous models [2], [3], [4]. We further provide the theo-retical analysis in the propagation dynamics.

Superiority in the Spreading Mechanism. Message notifica-tion indicates a user will get a reminder once a newmessagecomes to a social account. In previous models, they have

P 0ðopeniðtÞ ¼ 1Þ ¼ 0; otherwise1; t mod T i

check ¼ 0:

(43)

Thus, SADI model is revised as

v0aði; 1; tÞ ¼ ½qf � sði; 1; tÞ þ qs �mðt� 1Þ �miðt� 1ÞM � degi


� P ðstay1i ðtÞ ¼ 1Þ � P 0ðopeniðtÞ ¼ 1Þ(44)

v0bði; n; tÞ ¼X5k¼0

Xj:neighborofi;

dijmin

¼k


�

� ð1� rðtÞÞ � P ðstaynj!iðtÞ ¼ 1Þ � pj!iðkÞ � P 0ðopenjðtÞ ¼ 1Þ;(45)

and in our SADI models we have

vaði; 1; tÞ ¼ ½qf � sði; 1; tÞ þ qs �mðt� 1Þ �miðt� 1ÞM � degi


� P ðstay1i ðtÞ ¼ 1Þ � ½P 0ðopeniðtÞ ¼ 1Þ þ fðt ¼ t0Þ � qn�(46)

vbði; n; tÞ ¼X5k¼0

Xj:neighborofi;

dijmin

¼k


�

� ð1� rðtÞÞ � P ðstaynj!iðtÞ ¼ 1Þ � pj!iðkÞ � ½P 0ðopeniðtÞ ¼ 1Þ þ fðt; t0Þ � qn�;(47)

where fðt; t0Þ ¼ 1 denotes that a new message comes to asocial account at time t. Otherwise, fðt; t0Þ ¼ 0.

Because fðt; t0Þ > 0 and qn > 0,

vaði; 1; tÞ � v0aði; 1; tÞ > 0 and vbði; n; tÞ � v0bði; n; tÞ > 0; n > 1:

(48)

We can see that previous models cannot characterize thespreading mechanism of message notification, and theyunderestimate the spread speed of social worms.

Superiority in Topology Structure and Human Behaviors. Thehierarchical network is a network that describes the interde-pendency between human behaviors and network devicesfrom the perspective of both social logical layer and actualphysical layer. The temporal characteristic of human mobil-ity indicates that different resting time when a user visit dif-ferent locations impacts on the operation time of hosts.

For the two characteristics above, we have Equations (19),(24) and (25) in the SADI model, and use n1 and v1 to denotethem, respectively. Meanwhile, we use n2 and v2 to explainthe condition that social worms spread in the social networklayer without the temporal characteristic of humanmobility.

n1 ¼ nðtÞ ¼XMi¼1

XStin¼1

P ðXti;n ¼ Inf:Þ (49)

n2 ¼ nðtÞ ¼XMi¼1

P ðXti ¼ Inf:Þ (50)

v1 ¼ vði; n; tÞ ¼ vaði; n; tÞ; n ¼ 1vbði; n; tÞ; n > 1

(51)

v2 ¼ vði; tÞ ¼ qf � sði; tÞ þ qs �mðt� 1Þ �miðt� 1ÞM � degi

� � ð1� rðtÞÞ � P ðopeniðtÞ ¼ 1Þ:

(52)

According to Fig. 3, we have

XStin¼1

P Xti;n ¼ Inf:

� �¼

XStin¼1

v1 � P Xt�1i;n ¼ Sus:

� �

þ ð1� rðtÞÞ �XStin¼1

P Xt�1i;n ¼ Inf:

� � (53)

P Xti ¼ Inf:

� � ¼ v2 � P Xt�1i ¼ Sus:

� �þ ð1� rðtÞÞ � P Xt�1i ¼ Inf:

� �:

(54)

Topologies of previous models denote the relationshipbetween users. This is equivalent to the social logical layerin our SADI model. Thus, if the state of user i is susceptible,it implies that the state of all hosts used by user i are suscep-tible. We have

P ðXt�1i;n ¼ Sus:Þ > P ðXt�1

i Þ ¼ Sus:Þ: (55)

Moreover, at the initial time, we have P ðX0i;1 ¼ Inf:Þ ¼

P ðX0i ¼ Inf:Þ; S0

i ¼ 1. According to Equation (55) and calcu-lating by the iteration of Equations (53) and (54),

XStin¼1

P�Xt

i;n ¼ Inf:�> P Xt

i ¼ Inf:� �

: (56)

Thus,

n1 � n2 > 0: (57)

In order to evaluate the superiority of our SADI modelvividly, we plot related curves, as shown in Fig. 18. Thedivergence between the actual infection scale and its estima-tion is larger in previous models [2], [3], [4], [26]. On thebasis of the above analysis, we can see that our SADI modelis able to characterize the propagation of the social wormbetter.

6 FURTHER DISCUSSION

There is still some work that need to be done in the future.The most difficult problem of the propagation model onsocial worms oriented social networks, is that it is very diffi-cult for us to obtain real worm spreading data and then use itto evaluate our presented propagation model. This is parti-cularly so for social network and some human mobilityinformation because it involves the privacy of users, commu-nication traffic and location information which are hard toshare. Currently, we use some valuable data and conclusions


provided by authorities and high level articles to implementthe simulation for evaluating our SADI model. However, wewould like to develop software used for collecting spreadingincidents to solve this problem in the future.

7 RELATED WORK

In the last decade, there have been many research achieve-ments and findings that have focused on two different typesof worms: the scanning-basedworms and the topology-basedworms. The former relies on various scanning strategies toinfect victims. The latter relies on a topology structure to infecttheir neighbors. In recent years, researchers have given moreattention to detecting or estimating behaviors of scanning-based worms bymodeling the propagation model. The paperof [27] investigates a new class of active worms and designsa novel spectrum-based scheme to detect a camouflagingworm. The paper of [28] introduces a game-theoretical formu-lation to the spread of a self-disciplinary worm, and gives aneffective integration of worm detection and forensics analysisto defend against it. Then there are the state-of-the art scan-ning techniques studied by scientists. Manna et al. [29] pro-pose the propagation model of the permutation-scanningworms that precisely characterize the spread behaviors.

With the development of information techniques, socialsoftware has become the main communication tool in theInternet, such as Facebook, Twitter, and Emails, etc. Socialworms in the above platforms spread fast by getting trust ofgood friends. In this paper, we mainly discuss this kind ofworms, which belong to a topology-based spread. Theworks are classified into two classes: propagation simula-tions and analytic models of propagation dynamics. Theformer is discussed in [15], [16], and [30]. These simulationmodels can describe worm propagation behavior in socialnetworks very well, however they cannot provide analyticalstudy on the nature of the propagation.

There are many improved works on the analytic modelof propagation dynamics. Chen et al. [3] propose that theMarkov model can incorporate both detailed topologyinformation and simple spatial dependence to achieve agreater accuracy than previous models. The paper of [2]

presents an email worm model that accounts for behaviorsof checking emails. However, the above models have twoproblems: temporal dynamics and spatial dependence. Inorder to solve these two problems, the SII model is pre-sented in [4], and the results show that the SII model ismore suitable for modeling the propagation of socialworms. In addition, Livshits et al. [31] propose an automaticdetection and containment solution called Spectator, whichuses a distributed data training and tagging method todetect the spread of JavaScript worms. This paper [32] pro-poses a client-side solution to detect XSS worms by compar-ing self-replicating payloads with currently embeddedscripts in a cross-platform Firefox extension. Cao et al. [5]design PathCutter as a XSS detection tool to detect currentforms of XSS worms. They propose two integral mecha-nisms: view separation and request authentication. Faghaniet al. [6] present an analytical model to characterize thepropagation of XSS worms in OSNs, and proposes a studyof selective monitoring schemes. However, their existingworks focus on worm detection and lack an analysis of thepropagation dynamics of social worms. Meanwhile, they allneglect the influence of message notification, human mobil-ity and hierarchical networks on the spread of worms. Theyalso have a “social layer topology” assumption which doesnot accord with the topology in the real world. Wang et al.[33] present a THMmodel to address the above assumption,and it only theoretically and quantitatively study the spreadability of social worms. However, it cannot characterize themessage notification mechanism and the using process ofhosts shared by different users. Finally, some researcherscharacterize the propagation dynamics of isomorphicworms, such as P2P worms [14], [34], [35], and mobileworms [36], [37], [38]. However, in these models all authorshave a “homogenous mixing” assumption that is not suit-able for the analysis in the hierarchical network.

8 CONCLUSION

In the current era of the Internet, the spread of social wormshas the characteristics of interdisciplinarity and multi-dimen-sion. Thus, we incorporate human mobility into the researchof propagation modeling, and propose a novel SADI modelfor the propagation of social worms. This model is ableto address two core processes and one critical problem inprevious models: message notification, the temporal charac-teristic of human mobility and structural imperfection ofnetwork topology. We then conducted a number of experi-ments to analyze how these factors impact on the spread ofworms. Moreover, the experiments show that our SADImodel fits the simulation very well, which implies the accu-racy for modeling the propagation dynamics is very good.Finally, we also believe our work presented in this paper isof great significance to network defense.

ACKNOWLEDGMENTS

This work was supported by the National Natural ScienceFoundation of China (Grant No.61170295), the Project ofNational Ministries Foundation of China (Grant No.A2120110006), the Co-Funding Project of Beijing MunicipalEducation Commission (Grant No.JD100060630) and theResearch Project of Aviation Industry of China (Grant No.CXY2011BH07).

Fig. 18. Illustration of modeling various mechanisms. Ticheck � Exp

ð1=40Þ; P ðTirestÞ / ðTi

restÞ�1:8; a ¼ 2:5, < k >¼ 5:6, � ¼ 0:23, pij � Nð0:5; 0:22Þ, rðtÞ ¼ 0.


REFERENCES

[1] M. Fossi and J. Blackbird, “Symantec internet security threatreport 2013,” Symantec Corporation, Mountain View, CA, USA,Tech. Rep. 11, Apr. 2014.

[2] C. C. Zou, D. Towsley, and W. Gong, “Modeling and simulationstudy of the propagation and defense of internet e-mail worms,”IEEE Trans. Dependable Secure Comput., vol. 4, no. 2, pp. 105–118,Apr.–Jun. 2007.

[3] Z. Chen and C. Ji, “Spatial-temporal modeling of malware propa-gation in networks,” IEEE Trans. Neural Netw., vol. 16, no. 5,pp. 1291–1303, Sep. 2005.

[4] S. Wen, W. Zhou, J. Zhang, Y. Xiang, W. Zhou, and W. Jia,“Modeling propagation dynamics of social network worms,” IEEETrans. Parallel Distrib. Syst., vol. 24, no. 8, pp. 1633–1643, Aug. 2013.

[5] Y. Cao, V. Yegneswaran, P. A. Porras, and Y. Chen, “Pathcutter:Severing the self-propagation path of XSS javascript worms insocial web networks,” in Proc. 19th Netw. Distrib. Syst. SecuritySymp., 2012, pp. 1–14.

[6] M. R. Faghani and U. T. Nguyen, “A study of XSS worm propaga-tion and detection mechanisms in online social networks,” IEEETrans. Inf. Forensics Secur., vol. 8, no. 11, pp. 1815–1826, Nov. 2013.

[7] C. Song, T. Koren, P. Wang, and A.-L. Barab�asi, “Modelling thescaling properties of human mobility,” Nature Phys., vol. 6, no. 10,pp. 818–823, 2010.

[8] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhat-tacharjee, “Measurement and analysis of online social networks,”in Proc. 7th ACM SIGCOMMConf. Int. Meas., 2007, pp. 29–42.

[9] M. E. Newman, S. Forrest, and J. Balthrop, “Email networks andthe spread of computer viruses,” Phys. Rev. E, vol. 66, no. 3, 2002,Art. no. 035101.

[10] Y.-Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong, “Analysis oftopological characteristics of huge online social networking serv-ices,” in Proc. 16th Int. Conf. World Wide Web, 2007, pp. 835–844.

[11] R. Pastor-Satorras and A. Vespignani, “Epidemic dynamics infinite size scale-free networks,” Phys. Rev. E, vol. 65, no. 3, 2002,Art. no. 035108.

[12] M. Bogun�a, R. Pastor-Satorras, and A. Vespignani, “Epidemicspreading in complex networks with degree correlations,” in Proc.XVIII Sitges Conf. Statistical Mech. Lecture Notes Phys., 2003,pp. 127–147.

[13] J. O. Kephart and S. R. White, “Directed-graph epidemiologicalmodels of computer viruses,” in Proc. IEEE Comput. Soc. Symp.Res. Security Privacy, 1991, pp. 343–359.

[14] D. Chakrabarti, J. Leskovec, C. Faloutsos, S. Madden, C. Guestrin,and M. Faloutsos, “Information survival threshold in sensor andP2P networks,” in Proc. 26th IEEE Int. Conf. Comput. Commun.,2007, pp. 1316–1324.

[15] G. Yan, G. Chen, S. Eidenbenz, and N. Li, “Malware propagationin online social networks: Nature, dynamics, and defenseimplications,” in Proc. 6th ACM Symp. Inf. Comput. Commun. Secu-rity, 2011, pp. 196–206.

[16] C. Gao, J. Liu, and N. Zhong, “Network immunization and viruspropagation in email networks: Experimental evaluation and ana-lysis,” Knowl. Inf. Syst., vol. 27, no. 2, pp. 253–279, 2011.

[17] D. Moore and C. Shannon, “The nyxem email virus: Analysis andinferences,” The Center for Applied Internet Data Analysis(CAIDA), La Jolla, CA, USA, Tech. Rep. CME-24, Feb. 2006.

[18] D. Brockmann, L. Hufnagel, and T. Geisel, “The scaling laws ofhuman travel,”Nature, vol. 439, no. 7075, pp. 462–465, 2006.

[19] H. Tanizaki, Computational Methods in Statistics and Econometrics.Boca Raton, FL, USA: CRC Press, 2004.

[20] X. Liang, X. Zheng, W. Lv, T. Zhu, and K. Xu, “The scaling ofhuman mobility by taxis is exponential,” Physica A: Statist. Mech.Appl., vol. 391, no. 5, pp. 2135–2144, 2012.

[21] L. Sun, K. W. Axhausen, D.-H. Lee, and X. Huang,“Understanding metropolitan patterns of daily encounters,” Proc.Nat. Academy Sci. United States America, vol. 110, no. 34,pp. 13 774–13 779, 2013.

[22] X. Liang, J. Zhao, L. Dong, and K. Xu, “Unraveling the origin ofexponential law in intra-urban human mobility,” Sci. Rep., vol. 3,pp. 1–7, 2013.

[23] Y. Moreno, J. B. G�omez, and A. F. Pacheco, “Epidemic incidencein correlated complex networks,” Phys. Rev. E, vol. 68, no. 3, 2003,Art. no. 035103.

[24] D. Zhang and Y. Wang, “SIRS: Internet worm propagation modeland application,” in Proc. Int. Conf. Elect. Control Eng., 2010,pp. 3029–3032.

[25] W. Kandek, “The laws of vulnerabilities,” presented at the Black-Hat Conf., Tokyo, Japan, 2009.

[26] S. Wen, W. Zhou, Y. Wang, W. Zhou, and Y. Xiang, “Locatingdefense positions for thwarting the propagation of topologicalworms,” IEEE Commun. Lett., vol. 16, no. 4, pp. 560–563, Apr. 2012.

[27] W. Yu, X. Wang, P. Calyam, D. Xuan, and W. Zhao, “Modelingand detection of camouflaging worm,” IEEE Trans. DependableSecure Comput., vol. 8, no. 3, pp. 377–390, May/Jun. 2011.

[28] W. Yu, N. Zhang, X. Fu, and W. Zhao, “Self-disciplinary wormsand countermeasures: Modeling and analysis,” IEEE Trans. Paral-lel Distrib. Syst., vol. 21, no. 10, pp. 1501–1514, Oct. 2010.

[29] P. K. Manna, S. Chen, and S. Ranka, “Inside the permutation-scanning worms: Propagation modeling and analysis,” IEEE/ACM Trans. Netw., vol. 18, no. 3, pp. 858–870, Jun. 2010.

[30] W. Fan, K. Yeung, and K. Wong, “Assembly effect of groups inonline social networks,” Physica A: Statist. Mech. Appl., vol. 392,no. 5, pp. 1090–1099, 2013.

[31] V. B. Livshits and W. Cui, “Spectator: Detection and containmentof javascript worms,” in Proc. USENIX Annu. Tech. Conf., 2008,pp. 335–348.

[32] F. Sun, L. Xu, and Z. Su, “Client-side detection of XSS worms bymonitoring payload propagation,” in Computer Security. Berlin,Germany: Springer, 2009, pp. 539–554.

[33] T. Wang, C. Xia, and Q. Jia, “The temporal characteristic of humanmobility: Modeling and analysis of social worm propagation,”IEEE Commun. Lett., vol. 19, no. 7, pp. 1169–1172, Jul. 2015.

[34] S. Hatahet, A. Bouabdallah, and Y. Challal, “A new worm propa-gation threat in bittorrent: Modeling and analysis,” Telecommun.Syst., vol. 45, no. 2/3, pp. 95–109, 2010.

[35] V. Karyotis, “Markov random fields for malware propagation:The case of chain networks,” IEEE Commun. Lett., vol. 14, no. 9,pp. 875–877, Sep. 2010.

[36] G. Yan and S. Eidenbenz, “Modeling propagation dynamics ofbluetooth worms (extended version),” IEEE Trans. Mobile Comput.,vol. 8, no. 3, pp. 353–368, Mar. 2009.

[37] Z. Zhu, G. Cao, S. Zhu, S. Ranjan, and A. Nucci, “A social networkbased patching scheme for worm containment in cellularnetworks,” in Handbook of Optimization in Complex Networks.Berlin, Germany: Springer, 2012, pp. 505–533.

[38] G. Yan and S. Eidenbenz, “Bluetooth worms: Models, dynamics,and defense implications,” in Proc. 22nd Annu. Comput. Secur.Appl. Conf., 2006, pp. 245–256.

TianboWang is currently working toward the PhDdegree at Beihang University, under the supervi-sion of Prof. Chunhe Xia. He has participated inseveral National Natural Science Foundationsand other research projects as a contributor.His research interests include network and infor-mation security, intrusion detection technology,and information countermeasure.

Chunhe Xia received the PhD degree in computerapplication from Beihang University, Beijing, China,in 2003. He is currently a supervisor and professorwith Beihang University, a director of Beijing KeyLaboratory of Network Technology. He has partici-pated in different national major research projectsand published more than 70 research papers inimportant international conferences and journals.His current research focuses on network andinformation security, information countermeasure,cloud security, and networkmeasurement.

Sheng Wen received the graduate degree incomputer science from Central South Universityof China, in 2012 and the PhD degree from theSchool of Information Technology, Deakin Univer-sity, Melbourne, Australia, in 2014, under thesupervision of Prof. Wanlei Zhou and Yang Xiang.His focus is on modelling of virus spread, infor-mation dissemination, and defence strategies ofthe Internet threats.


Hui Xue received the BS degree in computer sci-ence and technology from Shandong University,in 2007. He is currently working toward the MSdegree at Beihang University. Then he worked forseveral companies in software development areafor more than 5 years.. His current researchfocuses on network and information security.

Yang Xiang received his PhD in Computer Sci-ence from Deakin University, Australia. He is theDirector of Centre for Cyber Security Research,Deakin University. His research interests includenetwork and system security, data analytics, dis-tributed systems, and networking. He has pub-lished more than 200 research papers in manyinternational journals and conferences, such asIEEE Transactions on Computers, IEEE Transac-tions on Parallel and Distributed Systems,IEEE Transactions on Information Security and

Forensics, and IEEE Journal on Selected Areas in Communications. Hehas served as the Program/General Chair for many international confer-ences such as SocialSec 15, IEEE DASC 15/14, IEEE UbiSafe 15/14,IEEE TrustCom 13, ICA3PP 12/11, IEEE/IFIP EUC 11, IEEE TrustCom13/11, IEEE HPCC 10/09, IEEE ICPADS 08, NSS 11/10/09/08/07. Heserves as the Associate Editor of IEEE Transactions on Computers,IEEE Transactions on Parallel and Distributed Systems, Security andCommunication Networks (Wiley), and the Editor of Journal of Networkand Computer Applications. He is the Coordinator, Asia for IEEE Com-puter Society Technical Committee on Distributed Processing (TCDP).He is a Senior Member of the IEEE.

Shouzhong Tu received the MS degree innetwork information security from BeihangUniversity, Beijing, China. Currently, he is with theBeijing Institute of Electronics Technology andApplication. His research interests include net-work security and artificial intelligence.

" For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.


sadi: a novel model to study the propagation of social...

Documents