composite behavioral modeling for identity theft …1 composite behavioral modeling for identity...

1

Composite Behavioral Modeling for IdentityTheft Detection in Online Social Networks

Cheng Wang, Senior Member, IEEE , and Bo Yang

Abstract—In this work, we aim at building a bridge from poor behavioral data to an effective, quick-response, and robust behavior modelfor online identity theft detection. We concentrate on this issue in online social networks (OSNs) where users usually have compositebehavioral records, consisting of multi-dimensional low-quality data, e.g., offline check-ins and online user generated content (UGC). Asan insightful result, we find that there is a complementary effect among different dimensions of records for modeling users’ behavioralpatterns. To deeply exploit such a complementary effect, we propose a joint model to capture both online and offline features of auser’s composite behavior. We evaluate the proposed joint model by comparing with some typical models on two real-world datasets:Foursquare and Yelp. In the widely-used setting of theft simulation (simulating thefts via behavioral replacement), the experimentalresults show that our model outperforms the existing ones, with the AUC values 0.956 in Foursquare and 0.947 in Yelp, respectively.Particularly, the recall (True Positive Rate) can reach up to 65.3% in Foursquare and 72.2% in Yelp with the corresponding disturbancerate (False Positive Rate) below 1%. It is worth mentioning that these performances can be achieved by examining only one compositebehavior (visiting a place and posting a tip online simultaneously) per authentication, which guarantees the low response latency of ourmethod. This study would give the cybersecurity community new insights into whether and how a real-time online identity authenticationcan be improved via modeling users’ composite behavioral patterns.

Index Terms—Online Social Networks, Identity Theft Detection, Composite Behavioral Modeling.

F

1 INTRODUCTION

With the rapid development of the Internet, more and moreaffairs, e.g., mailing [1], health caring [2], shopping [3],booking hotels and purchasing tickets are handled online [4].While, the Internet brings sundry potential risks of invasions[5], such as losing financial information [6], identity theft[7] and privacy leakage [3]. Online accounts serve as theagents of users in the cyber world. Online identity theft isa typical online crime which is the deliberate use of otherperson’s account [8], usually as a method to gain a financialadvantage or obtain credit and other benefits in other person’sname. In fact, compromised accounts are usually the portalsof most cybercrimes [1], [9], such as blackmail [6], fraud [10]and spam [11]. Thus, identity theft detection is essential toguarantee users’ security in the cyber world.

Traditional identity authentication methods [12], [13] aremostly based on access control schemes, e.g., passwords andtokens [14]. But users have to spend overheads in managingdedicated passwords or tokens. Accordingly, the biometricidentification [15]–[18] is delicately introduced to start the eraof password free. However, some disadvantages make theseaccess control schemes still incapable of being effective inreal-time online services [19]: (1) They are not non-intrusive.Users have to spend extra time in authentication. (2) They arenot continuous. The defending system will fail to take furtherprotection once the access control is broken.

Behavior-based suspicious account detection [19] is ahighly-anticipated solution to pursue a non-intrusive and con-tinuous identity authentication for online services. It depends

• Wang and Yang are with the Department of Computer Science andTechnology, Tongji University, and with the Key Laboratory of EmbeddedSystem and Service Computing, Ministry of Education, China. (E-mail:[email protected], [email protected])

on capturing users’ suspicious behavior patterns to discrimi-nate the suspicious accounts. The problem can be divided intotwo categories: fake/sybil account detection and compromisedaccount detection [20]. The fake/sybil account’s behaviorusually does not conform to the behavioral pattern of themajority. While, the compromised account usually behaves ina pattern that does not conform to his/her previous one, evenbehaves like fake/sybil accounts. It can be solved by capturingmutations of users’ behavioral patterns.

Comparing with detecting compromised accounts, detect-ing fake/sybil accounts is relatively easy, since the latter’sbehaviors are generally more detectable than the former’s.It has been extensively studied, and can be realized by var-ious population-level approaches, e.g., clustering [21], [22],classification [6], [23] and statistical or empirical rules [10],[24]–[26]. Thus, we only focus on the compromised accountdetection, commonly-called identity theft detection, based onindividual-level behavior models.

Recently, researchers have proposed different individual-level identity theft detection methods based on suspiciousbehavior detection [11], [27]–[33]. However, the efficacy ofthese methods significantly depends on the sufficiency ofbehavior records, suffering from the low-quality of behaviorrecords due to data collecting limitations or some privacyissues [3]. Especially, when a method only utilizes a specificdimension of behavioral data, the efficacy damaged by poordata is possibly enlarged. Unfortunately, most existing worksjust concentrate on a specific dimension of users’ behavior,such as keystroke [27], clickstream [30], and user generatedcontent (UGC) [11], [31], [32].

In this paper, we propose an approach to detect identity theftby jointly using multi-dimensional behavior records whichare possibly insufficient in each dimension. According tosuch characteristics, we choose the online social network

arX

iv:1

801.

0682

5v1

[cs

.SI]

21

Jan

2018

2

Fig. 1: An illustration of composite behavior space.

(OSN) as a typical scenario where most users’ behaviors arecoarsely recorded [34]. In the Internet era, users’ behaviorsare composited by offline behaviors, online behaviors, socialbehaviors, and perceptual/cognitive behaviors, as illustrated inFig. 1. For OSN users, the behaviorial data are collectablein many daily life applications, such as offline check-insin location-based services, online tips-posting in UGC sites,and social relationship-making in OSN sites [33], [35], [36].Accordingly, we design our method based on users’ compositebehaviors by these categories shown in Fig. 1.

We devote to proving that a high-quality (effective, quick-response, and robust) behavior model can be obtained by inte-grally using multi-dimensional behaviorial data, even thoughthe quality of data is extremely insufficient in each dimension.For this challenging objective, a precondition is to solve users’data insufficiency problem. The majority of users commonlyhave only several behavioral records that are too insufficientto build qualified behavior models. For this issue, we adopta tensor decomposition-based method [37] by combining thesimilarity among users (in terms of interests to both tipsand places) with social ties among them. Then, to fullyutilize potential information in composite behaviors for userprofiling, we propose a joint probabilistic generative modelbased on Bayesian networks, called Composite BehavioralModel (CBM). It offers a composition of the typical featuresin two different behavior spaces: check-in location in offlinebehavior space and UGC in online behavior space. Considera composite behavior of a user, we assume that its generativemechanism is as follows: When a user plans to visit a venueand simultaneously post tips online, he/she subconsciouslyselect a specific behavioral pattern according to his/her be-havior distribution. Then, he/she comes up with a topic and atargeted venue based on the present pattern’s topic and venuedistributions, respectively. Finally, his/her comment words aregenerated following the corresponding topic-word distribution.To estimate the parameters of the mentioned distributions, weadopt the collapsed Gibbs sampling [36].

Based on the joint model CBM, for each composite behav-ior, denoted by a triple-tuple (u, v,D), we can calculate thechance of user u visiting venue v and posting a tip onlinewith a set of words D. Taking into account different levelsof activity of different users, we devise a relative anomalousscore Sr to measure the occurrence rate of each compositebehavior (u, v,D). By these approaches, we finally realize

a real-time (i.e., judging by only one composite behavior)detection for identity theft suspects.

We evaluate the proposed joint model by comparing with thetypical models on two real-world OSN datasets: Foursquare[35] and Yelp [38]. We adopt the area under the receiveroperating characteristic curve (AUC) as the detection efficacy.In the widely-used setting of theft simulation (simulating theftsvia behavioral replacement) [39], the AUC value reaches 0.956in Foursquare and 0.947 in Yelp, respectively. Particularly, therecall (True Positive Rate) reaches up to 65.3% in Foursquareand 72.2% in Yelp with the corresponding disturbance rate(False Positive Rate) below 1%. Note that these performancescan be achieved by examining only one composite behaviorper authentication, which guarantees the low response latencyof our detection method. As an insightful result, we learn thatthe complementary effect does exist among different dimen-sions of low-quality records for modeling users’ behaviors.To the best of our knowledge, this is the first work jointlyleveraging online and offline behaviors to detect identity theftin OSNs.

The rest of this paper is organized as follows. We give anoverview of our solution in Section 2. Then, we present ourmethod in Section 3, and make the validation in Section 4.We provide a literature review in Section 5. Finally, we drawconclusions in Section 6.

2 OVERVIEW OF OUR SOLUTION

Online identity theft occurs when a thief steals a user’spersonal data and impersonates the user’s account. Generally,a thief usually first gathers information about a targeted userto steal his/her identity and then use the stolen identity tointeract with other people to get further benefits [9]. Criminalsin different online services usually have different motivations.In this work, we focus on online social networks (OSNs).In some OSNs, one may know their online friends’ real-life identities. Thieves usually utilize the strong trust betweenfriends to obtain benefit [40]. Their behavioral records usuallycontain specific sensitive terms. In other scenarios, friends maynot be familiar with each other and lack direct interactionsin the real world, which makes thieves can not obtain directbenefit from cheating users’ friends. So they turn to spreadmalicious messages in these OSNs. Among these maliciousmessages, some have explicit features such as URLs andcontact numbers, others only contain deceptive comments on aplace, a star or an event. The latter messages look like normalones, which makes them harder to detect. Thus, we apply awidely-used setting of theft simulation, i.e., simulating theftsvia behavioral replacement, to represent this kind of thieves.

An OSN user’s behavior is usually composite of online andoffline behaviors occurring in different behavioral spaces, asillustrated in Fig. 1. Based on this fact, we aim to propose ajoint model to embrace them into a unified model to deeplyextract information.

Before introducing our joint model, named Composite Be-havioral Model (CBM), we provide some conceptions aspreparations. The relevant notations are listed in Table 1.

3

TABLE 1: Notations of Parameters

Variable Description

w the word in UGC

v the venue or place

πu the community memberships of user u, expressed by amultinomial distribution over communities

θc the interests of community c, expressed by a multinomialdistribution over topics

ϑc a multinomial distribution over spatial items specific tocommunity c

φz a multinomial distribution over words specific to topic z

α, β, γ, η Dirichlet priors to multinomial distributions θc, φz , πu andϑc, respectively

Definition 1 (Composite Behavior). A composite behavior,denoted by a four-tuple (u, v,D, t), indicates that at time t,user u visits venue v and simultaneously posts online a tipconsisting of a set of words D.

We remark that for a composite behavior, the occurringtime t is a significant factor. Two types of time attributes playimportant roles in digging potential information for improvingthe identification. The first is the sequential correlation ofbehaviors. However, in real-life OSNs, the time intervalsbetween adjacent recorded behaviors are mostly unknown oroverlong, which leads that the temporal correlations cannotbe captured effectively. The second is the temporal propertyof behaviors, e.g., periodicity and preference variance overtime. However, in some real-life OSN datasets, the occurringtime is recorded with a low resolution, e.g., by day, whichshields the possible dependency of a user’s behavior on theoccurring time. Thus, it is difficult to obtain reliable time-related features of users’ behaviors. Since we aim to proposea practical method based on uncustomized datasets of userbehaviors, we only concentrate on the dependency betweena user’s check-in location and tip-posting content of eachbehavior, taking no account of the impact of specific occurringtime in this work. Thus, the representation of a compositebehavior can be simplified into a triple-tuple (u, v,D) withoutconfusion in this paper.

Our model depends on the following assumptions: (1) Eachuser behaves in multiple patterns with different possibilities;(2) Some users have similar behavioral patterns, e.g., similarinterests in topics and places.

To describe the features of users’ behaviors, we first intro-duce the topic of tips.

Definition 2 (Topic, [41]). Given a set of words W , a topicz is represented by a multinomial distribution over words,denoted by φz , whose every component φz,w denotes thepossibility of word w occurring in topic z.

Next, we formulate a specific behavioral pattern of usersby a conception called community.

Definition 3 (Community). A community is a set of userswith a similar behavioral pattern. Let C denote the set of allcommunities. A community c ∈ C has two critical parameters:

Maximum Improvement

Optimal Delay under ProM (or PhyM)

Optimal Delay under ProM (or PhyM)Optimal Delay under ProM (or PhyM)

Θ(√

nγ) Θ(√

nγ log n)Θ(1)

Θ(n)

Θ(n1−γ

log n)

Θ( log nnγ−1 )

(b) 0 < γ < 1: General HRWMM

D, Optimal delay for optimal capacity

lSΘ(1)

Θ(n)

(c) γ = 1: I.I.D.ModelΘ(

√n)


(a) γ = 0: Random Walk Model

Θ(√

log n) lSΘ(1) lS


Θ( nlog n

)

Θ(n log n)

Θ(1)

Optimal Delayunder GphyM

Optimal Delay under GphyM

Optimal Delayunder GphyM

Lower Bound on Optimal Delay under GphyM

Upper Bound on Optimal Delay under GphyM



No Improvement No Improvement



Optimal Delay under ProM (or PhyM)Θ(n)

lSΘ(1)


Θ(1) lS


Θ( nlog n

)

Θ(n log n)

Θ(√

log n)(e) 0 < δ < 1: General DRDMM(d) δ = 0: Brownian Model (f) δ = 1: Random Way-Point Model

lS


Θ(1)

Θ(n)Optimal Delay under GphyM

under GphyM

Optimal Delay under GphyMOptimal Delay



Fig. 1. Optimal delay for the optimal capacity (of order Θ(1)) and the corresponding critical parameter lS under theprotocol model (ProM)/physical model (PhyM) and generalized physical model (GphyM).

D BuCZ

C

α z c

w v

π

β ϑ η

γ

U

θ

ϕ

Fig. 4. Optimal delay depending on the freedom degree γ. The solid curve (including the “singletons”, i.e., isolatednodes) denotes the function of optimal delay in terms of γ; the dashed curves denote the upper and lower bounds ofthis function.

can explain why improvements on capacity-delay tradeoffs forsome mobility models can be achieved by the rate adaptation.

2 CONCLUSION

In this paper, we initiated an investigation of the impact ofadaptive-rate communication model on capacity-delay trade-offs in MANETs under some classical mobility models. Wederived the optimal delay for the optimal unicast capacityby using the well-known two-hop relay policy, and made itclear how the capacity-delay tradeoffs in MANETs changeunder different mobility models when the rate adaptation isintroduced.

There are some limitations of our work that can be leftfor the future research: (1) There remain gaps between thelower and upper bounds on capacity and delay for someregimes. It is necessary to derive tight bounds in the whole

regime and provide more complete and conclusive results.(2) In order to concentrate on stressing new insights of theimpact of rate adaptation, we constrained the strategies to thetype of simple threshold-based two-hop relaying schemes inthis work. An important work is to extend our results bytaking into account some advanced relay techniques, suchas replication and network-layer cooperation policies. (3) Weonly considered unicast sessions in this work. It should beinteresting to extend our results to other traffic sessions, e.g.,multicast, broadcast, convergecast, anycast and manycast.

REFERENCES[1] M. Grossglauser and D. Tse, “Mobility increases the capacity of ad hoc

wireless networks,” IEEE/ACM Trans. on Networking, vol. 10, no. 4, pp.477–486, 2002.

[2] M. Neely and E. Modiano, “Capacity and delay tradeoffs for ad hocmobile networks,” IEEE Trans. on Information Theory, vol. 51, no. 6,pp. 1917–1937, 2005.

Fig. 2: Graphical representation of joint model. The parametersare explained in Table 1.

(1) A topic distribution θc whose component, say θc,z , indi-cates the probability that users in community c send a messagewith topic z. (2) A spatial distribution ϑc whose component,say ϑc,v , represents the chance that users in community c visitvenue v.

More specifically, we assume that a community is formedby the following procedure: Each user u is included incommunities according to a multinomial distribution, denotedby πu. That is, each component of πu, say πu,c, denotes u’saffiliation degree to community c. Similarly, we allocate eachcommunity c with a topic distribution θc to represent its onlinetopic preference and a spatial distribution ϑc to represent itsoffline mobility pattern.

The graphical representation of our joint model CBM isdemonstrated in Fig. 2.

3 METHOD

3.1 Composite Behavioral ModelGenerally, users take actions according to their regular be-havioral patterns which are represented by the correspondingcommunities (Definition 3). We present the behavioral gener-ative process in Algorithm 1: When a user u is going to visit avenue and post online tips there, he/she subconsciously selecta specific behavioral pattern, denoted by community c, ac-cording to his/her community distribution πu (Line 11). Then,he/she comes up with a topic z and a targeted venue v based onthe present community’s topic and venue distributions (θc andϑc, respectively) (Line 12− 13). Finally, the words of his/hertips in D are generated following the topic-word distributionφz (Line 15).

Exact inference of our joint model CBM is difficult due tothe intractable normalizing constant of the posterior distribu-tion, [36]. We adopt collapsed Gibbs sampling for approxi-mately estimating distributions (i.e., θ, ϑ, φ and π). As forthe hyperparameters, we take a fixed value, i.e., α = 50/Z,γ = 50/C and β = η = 0.01, following the study in [42],where Z and C are the numbers of topics and communities,respectively.

In each iteration, for each composite behavior (u, v,D), wefirst sample community c according to Eq. (1):

P (c|c¬, z,v, u) ∝ (n¬u,c+γ)n¬c,z + α∑z′(n

¬c,z′ + α)

n¬c,v + η∑v′(n

¬c,v′ + η)

,

(1)where c¬ denotes the community allocation for all compositebehaviors except the current one; z denotes the topic allocation

4

Algorithm 1 Joint Probabilistic Generative Process

1: for each community c ∈ C do2: Sample the distribution over topics θc ∼ Dirichlet(·|α)3: Sample the distribution over venues

ϑc ∼ Dirichlet(·|η)4: end for5: for each topic z ∈ Z do6: Sample the distribution over words

φz ∼ Dirichlet(·|β)7: end for8: for each user u ∈ U do9: Sample the distribution over communities

πu ∼ Dirichlet(·|γ)10: for each composite behavior (u, v,D) ∈ Bu do11: Sample a community indicator c ∼Multi(πu)12: Sample a topic indicator z ∼Multi(θc)13: Sample a venue v ∼Multi(ϑc)14: for each word w ∈ D do15: Sample a word w ∼Multi(φz)16: end for17: end for18: end for

for all composite behaviors; nu,c denotes the number of timesthat community c is generated by user u; nc,z denotes thenumber of times that topic z is generated by community c; nc,vdenotes the number of times that venue v is visited by usersin community c; a superscript ¬ denotes something except thecurrent one.

Then, given a community c, we sample topic z accordingto the following Eq. (2):

P (z|z¬, c,D) ∝ (n¬c,z + α)∏w∈D

n¬z,w + β∑w′(n¬z,w′ + β)

, (2)

where nz,w denotes the number of times that word w isgenerated by topic z.

The inference algorithm is presented in Algorithm 2. Wefirst randomly initialize the topic and community assignmentsfor each composite behavior (Line 2−4). Then, we update thecommunity and topic assignments for each composite behaviorbased on Eqs. (1) and (2) in each iteration (Line 6−9). Finally,we estimate the parameters, test the coming cases and updatethe training set every Is iterations since Ibth iteration (Line10− 13) to address concept drift.

To overcome the problem of data insufficiency, we adopt thetensor decomposition [37] to discover their potential behaviors.In our experiment, we use the Twitter-LDA [43] to obtaineach UGC’s topic and construct a tensor A ∈ RN×M×L,with three dimensions standing for users, venues and topics.Then, A(u, v, z) denotes the frequency that user u posting amessage on topic z in venue v. We can decompose A intothe multiplication of a core tensor S ∈ RdU×dV ×dZ and threematrices, U ∈ RN×dU , V ∈ RM×dV , and Z ∈ RL×dZ , ifusing a tucker decomposition model, where dU , dV and dZdenote the number of latent factors; N , M and L denote thenumber of users, venues and topics. An objective function to

Algorithm 2 Inference Algorithm of the Joint Model CBM

Require: user composite behavior collection B, number ofiteration I , start saving step Ib, saving lag Is, start trainingsequence number Nb, end training sequence number Ne,hyperparameters α, β, γ and η

Ensure: estimated parameters θ, ϑ, φ, π1: Create temporary variables θsum, ϑsum, φsum and πsum,

initialize them with zero, set testing sequence numberNt = 0 and let B(Nt) denotes the corresponding trainingcollection for testing behaviors which sequence numbervalues Nt

2: for each composite behavior (u, v,D) ∈ B(Nt) do3: Sample community and topic randomly4: end for5: for iteration = 1 to I do6: for each behavior (u, v,D) ∈ B(Nt) do7: Sample community c according to Eq. (1)8: Sample topic z according to Eq. (2)9: end for

10: if (iteration > Ib) and (iteration mod Is == 0)then

11: return model parameters as follows:θc,z =

nc,z+α∑z′ (nc,z′+α)

; ϑc,v =nc,v+η∑

v′ (nc,v′+η)

πu,c =nu,c+γ∑

c′ (nu,c′+γ); φz,w =

nz,w+β∑w′ (nz,w′+β)

12: Evaluate corresponding test cases and update Nt++;Nb ++; Ne ++

13: end if14: end for

control the errors is defined as:

L(S,U,V,Z) = 1

2‖A− S×U U×V V ×Z Z‖2

+λ

2

(‖S‖2 + ‖U‖2 + ‖V‖2 + ‖Z‖2 +

∑(i,j)∈F

uTi uj

),

where F is a set of friend pairs (i, j). A∗ = S×UU×V V×ZZis the potential frequency tensor, and A∗(u, v, z) denotes thefrequency that user u may post a message on topic z in venuev. A higher A∗(u, v, z) indicates that user u has a higherchance to do this kind of behavior in the future. We limit thecompetition space to the behavior space of u’s friends, i.e.,{

(u, v, z)|A(u′, v, z) > 0, (u, u

′) ∈ F

},

and select the top 20 behaviors as his/her latent behavior toimprove data quality.

3.2 Identity Theft Detection SchemeBy the parameters Ψ = {θ, ϑ, φ, π} learnt from the inferencealgorithm (Algorithm 2), we can estimate the logarithmicanomalous score (Sl) of a composite behavior (u, v,D) byEq. (3):

Sl(u, v,D) = − lgP (v,D|u)

= − lg

(∑c

πu,cϑc,v∑z

θc,z(∏w∈D

φz,w)1

|D|

).

(3)

5

TABLE 2: Statistics of Foursquare and Yelp Datasets

Foursquare Yelp

# of users 31,493 80,592

# of venue 143,923 42,051

# of check-ins 267,319 491,393

0 20 40 60 80 1000.0

5.0k

10.0k

15.0k

20.0k

# of

use

rs

# of records0 20 40 60 80 100

0.0

20.0k

40.0k

60.0k

# of

use

rs

# of records

(a) Foursquare. (b) Yelp.

Fig. 3: The distribution of user record counts.

However, we may mistake some normal behaviors occurringwith low probability, e.g., the normal behaviors of users whosebehavioral diversity and entropy are both high, for suspiciousbehaviors. Thus, we propose a relative anomalous score (Sr)to indicate the trust level of each behavior by Eq. (4):

Sr(u, v,D) = 1− P (u|v,D) = 1− P (v,D|u)P (u)∑u′ P (v,D|u′)P (u′)

.

(4)We randomly select 40 users to estimate the relative anomalousscore for each composite behavior. Our experimental resultsin Section 4 show that the approach based on Sr outperformsthe approach based on Sl.

4 EVALUATION

In this section, we present the experimental results to evaluatethe proposed joint model CBM, and validate the efficacy ofthe joint model for identity theft detection on real-world OSNdatasets.

4.1 DatasetsOur experiments are conducted on two real-life large OSNdatasets: Foursquare [35] and Yelp [38]. They are two well-known online social networking service providers. In bothdatasets, there is no URLs or other sensitive terms. Bothdatasets contain users’ social ties and behavioral records. Eachsocial tie contains user-ID and friend-ID. Each behavior recordcontains user-ID, venue-ID, timestamp and UGC. Their basicstatistics are shown in Table 2.

We count each user’s records, and present the results in Fig.3. It shows that most users have less than 5 records in bothdatasets. The quality of these dataset is too poor to modelindividual-level behavioral patterns for the majority of users,which confronts our method with a big challenge.

4.2 Experiment Settings4.2.1 Suspicious Behavior SimulationMany works [8], [29], [44] aimed at discovery theft’s behav-ioral pattern. Bursztein et al. [40] pointed out that identity

5 6 7 8 9 10 11 120

2k

4k

6k

8k

# of

reco

rds

Sl

Normal Behavior

5 6 7 8 9 10 11 120

100

200

300

400

500

# of

reco

rds

Anomalous Behavior

5 6 7 8 9 10 11 120

2k

4k

6k

8k

# of

reco

rds

Sl

Normal Behavior

5 6 7 8 9 10 11 120

100

200

300

400

# of

reco

rds

Anomalous Behavior

(a) Foursquare (b) Yelp

Fig. 4: The histogram of logarithmic anomalous score Sl(defined in Eq. (3)) for each behavior.

thieves usually behave in two possible suspicious patterns, i.e.,(1) behaving unlike the majority of users; (2) behaving onlyunlike the victim. Many existing outlier detecting techniques,e.g., i-Forest [45], LOF [46] and GSDPMM [47] can deal withthe former cases. Besides, we notice that the former can beregarded as a special case of the latter. It is straightforwardthat an effective detection method for the latter can applyeffectively to the former cases. If the experiments validate thatour model performs well even for detecting such crafty thieves,a strong argument can be obtained to prove the capability ofour model. This is the reason why we focus on the latter caseswhere thieves tend to hind them among the people.

In the experiments, we use two real-life datasets, andassume that all records are normal behaviors. We simulatesuspicious behaviors by exchanging some users’ behavioralrecords and setting them as positive instances [40]. Thistheft simulation process imitates one kind of the most craftythieves who behave just like normal users. More specifically,we first rank behavior records according to their timestamps.Then, we select the top 80% behavior records for trainingand the rest for testing. To simulate suspicious behaviors, werandomly exchange 5% of all behavior records in the testset as anomalous behaviors. Totally, we have 56, 236 testbehaviors in Foursquare and 71, 667 in Yelp, and make up2, 884 anomalous behaviors in Foursquare and 3, 583 in Yelp,respectively.

4.2.2 MetricsFor the convenience of description, we first give a confusionmatrix in Table 3.

TABLE 3: Confusion Matrix for Binary Classification.

Predicted ConditionTrue Condition

Positive NegativePositive True Positive (TP) False Positive (FP)Negative False Negative (FN) True Negative (TN)

In the experiments, we set anomalous behaviors as positiveinstances, and focus on the following four metrics, since the

6

0.0 0.2 0.4 0.6 0.8 1.00

1k

2k

3k

4k

# of

reco

rds

Sr

Normal Behavior

0.0 0.2 0.4 0.6 0.8 1.00

1k

2k

3k

# of

reco

rds

Anomalous Behavior

0.0 0.2 0.4 0.6 0.8 1.00

2k

4k

6k

8k

# of

reco

rds

Sr

Normal Behavior

0.0 0.2 0.4 0.6 0.8 1.00

1k

2k

3k

4k

# of

reco

rds

Anomalous Behavior


Fig. 5: The histogram of relative anomalous score Sr (definedin Eq. (4)) for each behavior.

identity theft detection is essentially an imbalanced binaryclassification problem [48].

True Positive Rate (TPR): TPR is computed by TPTP+FN ,

and indicates the proportion of true positive instances in allpositive instances (i.e., the proportion of anomalous behaviorsthat are detected in all anomalous behaviors). It is also knownas recall. Specifically, we named it detection rate.

False Positive Rate (FPR): FPR is computed by FPFP+TN ,

and indicates the proportion of false positive instances in allnegative instances (i.e., the proportion of normal behaviors thatare mistaken for anomalous behaviors in all normal behaviors).Specifically, we named it disturbance rate.

Precision: The precision is computed by TPTP+FP , and in-

dicates the proportion of true positive instance in all predictedpositive instance (i.e., the proportion of anomalous behaviorsthat are detected in all suspected cases).

AUC: Given a rank of all test behaviors, the AUC valuecan be interpreted as the probability that a classifier/predictorwill rank a randomly chosen positive instance higher than arandomly chosen negative one.

4.2.3 Threshold SelectionIt is an important issue in classification tasks. Recall that forthe hyper-parameters α, β, γ and η, we adopt a fixed value,i.e., α = 50/Z, γ = 50/C and β = η = 0.01, followingthe study [36]. Specifically, we take the case that C = 30and Z = 20 as an example to present the threshold selectionstrategy. The parameter sensitivity analysis will be conductedin the following Section 4.2.4. We compare the distribution oflogarithmic anomalous score Sl (or relative anomalous scoreSr) for normal behaviors with that for anomalous behaviors.Figs. 4 and 5 present the differences between normal andanomalous behaviors in terms of the distributions of Sl andSr, respectively. They show that the differences are bothsignificant, and the difference in terms of Sr is much moreobvious.

To obtain a reasonable threshold, we focus on the perfor-mance where the threshold changes from 0.975 to 1, since this

0.98 0.99 1.000

50

100

150

200

# of

reco

rds

Sr

Normal Behavior

0.98 0.99 1.000

100

200

300

400

# of

reco

rds

Anomalous Behavior

0.98 0.99 1.000

75

150

225

300

# of

reco

rds

Sr

Normal Behavior

0.98 0.99 1.000

150

300

450

600

# of

reco

rds

Anomalous Behavior


Fig. 6: A partial of the distribution of relative anomalous scoreSr (defined in Eq. (4)) for each behavior.

0.00 0.01 0.02 0.03 0.040.0

0.2

0.4

0.6

0.8

1.0

TPR

FPR0.00 0.01 0.02 0.03 0.04 0.05

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TPR


Fig. 7: A partial of ROC (receiver operating characteristic)curve of identity theft detection.

range contains 81.5% (81.4%) of all anomalous behaviors and3.9% (4.8%) of all normal behaviors in Foursquare (Yelp).The detailed trade-offs are demonstrated in Figs. 6 and 7from different aspects. To optimize the trade-offs of detectionperformance, we define the detection Cost in Eq. (5):

Cost =# of newly mistaken normal behaviors

# of newly identified anomalous behaviors. (5)

We present the threshold-cost curve in Fig. 8. It shows that asmaller threshold usually corresponds to a larger cost.

We select the minimum threshold satisfying that the corre-sponding cost is less than 1. Thus, we choose 0.989 and 0.992as the thresholds for Foursquare and Yelp, respectively. Underthem, our joint model CBM reaches 62.32% (68.75%) in TPRand 0.85% (0.71%) in FPR on Foursquare (Yelp). Please referto Table 4 for details.

4.2.4 Parameter Sensitivity AnalysisParameter tuning is another important part of our work. Theperformance of our model is indeed sensitive to the numberof communities (C) and topics (Z). Therefore, we study theimpact of varying parameters in our model. We select therelative anomalous score Sr as the test variable, and evaluatethe performance of our model by changing the values of Cand Z. The experimental results are summarized in Tables 5.

From the results on both datasets, the detection efficacy goesstable when the number of topics reaches 20 and the number

7

0.975 0.980 0.985 0.990 0.995 1.0000

2

4

6

8Cos

t

Threshold0.975 0.980 0.985 0.990 0.995 1.0000

5

10

15

20

Cos

t

Threshold


Fig. 8: Detection costs with different thresholds.

TABLE 4: A Summary of Different Metrics [49] with theThreshold 0.989 for Foursquare and 0.992 for Yelp.

Foursquare Yelp

AUC 0.956 0.947Precision 79.91% 83.55%Recall (TPR) 62.32% 68.75%FPR 0.85% 0.71%

TNR 99.15% 99.29%FNR 37.68% 31.25%Accuracy 97.26% 97.76%F1 0.700 0.754

of communities has a larger impact on the efficacy. Thus, weset C = 30 and Z = 20 in our joint model, and present thereceiver operating characteristic (ROC) and Precision-Recallcurves in Figs. 9 and 10, respectively. Specifically, we presentdetection rate (TPR) in Table 6, where disturbance rate (FPR)reaches 1% and 0.1%, respectively.

4.3 Performance Comparison4.3.1 Representative ModelsWe compare our joint model CBM to some representativemodels in OSNs. In Table 7, we list the features of thesemodels.

CF-KDE. Before presenting the CF-KDE model, we in-troduce the Mixture Kernel Density Estimate (MKDE) [39]to give a brief prior knowledge on Kernel Density Estimate(KDE). MKDE is an individual-level spatial distribution modelbased on KDE. It is a typical spatial model describing user’soffline behavioral pattern. In this model, it mainly utilizes abivariate density function in the following equations to capturethe spatial distribution for each user:

fKDE (e|E, h) = 1

n

∑n

j=1Kh

(e− ej

), (6)

Kh (x) =1

2πhexp

(−1

2xTH−1x

),H =

(h 00 h

), (7)

fMKDE (e|E, h) = αfKD (e|E1) + (1− α) fKD (e|E2) .(8)

In Eq. (6), E = {e1, ..., en} is a set of historical behavioralrecords for a user and ej =< x, y > is a two-dimensionalspatial location (i.e., offline behavior). Eq. (7) is a kernelfunction and H is the bandwidth matrix. MKDE adopts Eq.(8), where E1 is a set of an individual’s historical behavioralrecords (individual component), E2 is a set of his/her friends’historical behavioral records (social component), and α is the

TABLE 5: AUC on Foursquare (Yelp) Dataset

C=10 C=20 C=30

Z=10 0.876 (0.910) 0.945 (0.936) 0.953 (0.945)Z=20 0.917 (0.915) 0.946 (0.938) 0.956 (0.947)Z=30 0.922 (0.917) 0.947 (0.938) 0.957 (0.947)

TABLE 6: Detection Rates with Different Disturbance Rates.

Foursquare Yelp

Disturbance Rate=0.1% 30.8% 31.7%

Disturbance Rate=1.0% 65.3% 72.2%

weight variable for individual component. To detect identitythieves, we compute a surprise index Se in Eq. (9) foreach behavior e, defined as the negative log-probability ofindividual u’s conducting behavior e:

Se = − log fMKDE (e|Eu, hu) . (9)

Furthermore, we can select the top-N behaviors with thehighest Se as suspicious behaviors.

In MKDE model, it assumes that users tend to do liketheir friends in the same chance. It has not quantified thepotential influence of different friends. Thus, we introducea collaborative filtering method to improve the performance.Based on the historical behavioral records, it establishes a user-venue matrix R|U|×|V|, where U and V are the number ofusers and venues, respectively; Rij = 1 if user i has visitedvenue j in the training set, otherwise Rij = 0. We adopt amatrix factorization method with an objective function in Eq.(10) to obtain feature vectors for each user and venue:

L = minU,V

1

2

U∑i=1

V∑j=1

(Rij−uTi vj)2+λ12

U∑i=1

uTi ui+λ22

V∑j=1

vTj vj .

(10)Specifically, we let

ui = (u(1)i , u

(2)i , ..., u

(k)i )T and vj = (v

(1)j , v

(2)j , ..., v

(k)j )T .

We adopt a stochastic gradient descent algorithm in Eqs. (11)and (12) in the optimization process:

u(k)i ← u

(k)i − α(

∑V

j=1(Rij − uTi vj)v(k)j + λ1u

(k)i ), (11)

v(k)j ← v

(k)j − α(

∑U

i=1(Rij − uTi vj)u(k)i + λ2v

(k)j ). (12)

Consequently, we can figure out R = UTV, and userij = uTi vj as the weight variable for the KDE model. Todetect anomalous behaviors, we use Eq. (13) to measure thesurprising index for each behavior e:

Se = − log

∑nj=1 rujKh

(e− ej

)∑nj=1 ruj

. (13)

Furthermore, we can select the top-N behaviors with thehighest Se as suspicious behaviors.

LDA. Latent Dirichlet Allocation (LDA) [41] is a classictopic model. User’s online behavior pattern can be denoted as

8

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0TPR

FPR

AUC

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

TPR

FPR

AUC


Fig. 9: The ROC curves of identity theft detection via the jointmodel CBM.

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

Prec

ision

Recall0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Prec

ision

Recall(a) Foursquare (b) Yelp

Fig. 10: The Precision-Recall curves of identity theft detectionvia the joint model CBM.

the mixing proportions for topics. We aggregate the UGC ofeach user and his/her friends in the training set as a document,then use LDA to obtain each user’s historical topic distributionθhis. To get their present behavioral topic distribution θnew inthe test set. For each behavior, we count the number of wordsassigned to the kth topic, and denoted it as n(k). The kthcomponent of the topic proportion vector can be computed inEq. (14):

θ(k)new =n (k) + α∑Ki=1(n (i) + α)

, (14)

where K is the number of topics, and α is a hyperparameter.Specifically, we set α = 50/K.

To detect anomalous behaviors, we measure the distancebetween a user’s historical and present topic distribution byusing the Jensen-Shannon (JS) divergence in Eqs. (15) and(16):

DKL (θhis, θnew) =∑K

i=1θ(i)his · ln

(θ(i)his/θ

(i)new

), (15)

DJS (θhis, θnew) =1

2[DKL (θhis,M) +DKL (θnew,M)] .

(16)where M = θhis+θnew

2 . We can select the top-N behaviorswith the highest DJS(θhis, θnew) as suspicious behaviors.

Fused Model. Egele et al. [8] propose COMPA whichdirectly combining use users’ explicit behavior features, e.g.,language, links, message source, et al. In our case, we setupa fused model, which deep combine users’ implicit behaviorfeatures discovered by CF-KDE and LDA to detect identitytheft. We try different thresholds for the CF-KDE model andLDA model (i.e., different classifiers). For each pair (i.e., a CF-KDE model and an LDA model), we treat any behavior thatfails to pass either identification model as suspicious behavior,

TABLE 7: Behaviors Adopted in Different ModelsOnline Behavior (UGC) Offline Behavior (Check-in)

CF-KDE NO YESLDA YES NOFUSED YES YESJOINT YES YES

0.894

0.7830.741

0.708

0.902 0.9090.8960.854

0.956 0.947

Foursquare Yelp0.650.700.750.800.850.900.951.00

AUC

CF-KDE LDA FUSED JOINT-SL JOINT-SR

Fig. 11: Identity theft detection efficacy.

and compute true positive rate and false positive rate to drawthe ROC curve and estimate the AUC value.

4.3.2 Performance Comparison

We compare the performance of our method with the typicalones in terms of detection efficacy (AUC) and response la-tency. The latter denotes the number of behaviors in the testset needed to cumulate for detecting a specific identity theftcase.

Detection Efficacy Analysis. In Fig. 11, we present the re-sults of all comparison methods. Our joint model outperformsall other methods on the two datasets. The AUC value reaches0.956 and 0.947 in Foursquare and Yelp datasets, respectively.

There are three reasons for its outstanding performance.Firstly, it embraces different types of behaviors and exploitsthem in a unified model. Secondly, it takes advantage ofthe community members’ and friends’ group-level behaviorpatterns to overcome the data insufficiency and concept drift[40] in individual-level behavioral patterns. Finally, it utilizescorrelations among different behavioral spaces.

From the results, we have several other interesting observa-tions: (1) LDA model performs poor in both datasets whichmay indicate its performance is strongly sensitive to the dataquality. (2) CF-KDE and LDA model performs not well inYelp dataset comparing to Foursquare dataset, but the fusedmodel observes a surprising reversion. (3) The joint modelbased on relative anomalous score Sr outperforms the modelbased on logarithmic anomalous score Sl. (4) The joint model(i.e., JOINT-SR, the joint model in the following sections allrefer to the joint model based on Sr) is indeed superior to thefused model.

Response Latency Analysis. For each model, we alsoevaluate the relationship between the efficacy and responselatency (i.e., a response latency k means that the identity theftis detected based on k recent continuous behaviors). Figs. 12and 13 demonstrate the AUC values and TPRs via differentresponse latency in each model on both datasets.

The experimental results indicate that our joint model CBMis superior to all other methods. The AUC values of our joint

9

1 2 3 4 50.70

0.75

0.80

0.85

0.90

0.95

1.00AUC

Response Latency

CF-KDE LDA FUSED JOINT

1 2 3 4 50.70

0.75

0.80

0.85

0.90

0.95

1.00

AUC

Response Latency



Fig. 12: Identity theft detection efficacy via different responselatency (i.e., the number of behaviors in the test set wecumulated).

1 2 3 4 50.4

0.5

0.6

0.7

0.8

0.9

1.0

TPR

Response Latency


1 2 3 4 50.2

0.4

0.6

0.8

1.0

TPR

Response Latency



Fig. 13: The detection rates (TPR) via different responselatency with disturbance rate=0.01 (FPR=0.01).

model can reach 0.998 in both Foursquare and Yelp with 5test behavioral records. The detection rates (TPR) of our jointmodel can reach 93.8% in Foursquare and 97.0% in Yelp with5 test behavioral records and disturbance rate (FPR) values1.0%.

4.3.3 Robustness AnalysisGenerally, there are two kinds of mutations in individual-level(IL) suspicious behavioral patterns:

Completely Behavioral Mutation. Some thieves tore offtheir masks once intruding into victim’s account. They usuallyperform totally different interest in venues and topics.

Partially Behavioral Mutation. Some extremely cunningthieves maintain part of victim’s behavioral pattern to getfurther benefits from the victim’s friends. They may showpartial behavior mutation which makes it harder to detect theseanomalous behaviors.

In the previous experiments, we evaluate the performanceof our model in a scenario where thieves act like normal usersby exchanging normal user’s behavioral records (exchangingboth venue and UGC). Furthermore, we consider the harderscenarios where thieves know part of victim’s habits andaccordingly imitate victims. We apply our model to thesescenarios, and demonstrate experimental results in Fig. 14.The results validate that our method is robust for coping withvarious suspicious behaviors.

4.4 Explanations on Advantages of Joint Model4.4.1 Intuitive ExplanationsGenerally, there are two paradigms to integrate behavioraldata: the fused and joint manners [8] . Fused models [8]

0.956 0.947

0.7820.753

0.8810.821

0.653

0.722

0.5240.5680.564

0.604

Foursquare Yelp0.5

0.6

0.7

0.8

0.9

1.0

0.5

0.6

0.7

0.8

0.9

1.0

TPR

AUC

No Guise Guise in Venue Guise in Content

Fig. 14: The efficacy (AUC) and detection rate (TPR) ofidentity theft detection via the joint model CBM in differentscenarios with disturbance rate=0.01 (FPR=0.01). Painted onesdenote AUC and shaded ones denote TPR.

are a relatively simple and straightforward kind of com-posite behavior models. They first capture features in eachbehavior space respectively, and then make a comprehensivemetric based on these features in different dimensions. Withthe possible complementary effect among different behaviorspaces, they can act as a feasible solution for integration.However, the identification efficacy can be further improved,since fused models neglect potential links among differentspaces of behaviors. We take an example where a personposted a picture in an OSN when he/she visited a park. If thiscomposite behavior is simply separated into two independentparts: he/she once posted a picture and he/she once visited apark, the difficulty in relocating him/her from a group of usersis possibly increased, since there are more users satisfy thesetwo simple conditions comparing to the original condition.

On the contrast, our joint model CBM sufficiently exploitsthe correlations between behaviors in different dimensions,then increases the certainty of users’ behavior patterns, whichcontributes to a better identification efficacy.

4.4.2 Theoretical ExplanationWe provide an underlying information theoretical explanationfor the gain of joint models. The well-known Chain Rule forEntropy [50],

H(X1, X2, ..., XN ) ≤∑N

i=1H(Xi),

indicates that the entropy of N simultaneous events, denotedby Xi, i = 1, 2, · · · , N , is no more than the sum of theentropies of each individual event, and are equal if the eventsare independent.

The chain rule for entropy shows that the joint behaviorhas lower uncertainty comparing to the sum of the uncertaintyin each component [51]. This can serve as a theoreticalexplanation of the advantages of our joint model.

5 LITERATURE REVIEW

Recently, researchers found that users’ behavior can identifytheir identities [3], [28], [52]. Typically, behavior-based useridentification include two phases: user profiling and useridentifying:

User profiling is a process to characterize a user withhis/her history behavioral data. Some works focus on statisticalcharacteristics to establish the user profile. Naini et al. [53]

10

studied the task of identifying the users by matching thehistograms of their data in the anonymous dataset with thehistograms from the original dataset. Egele et al. [8] proposeda behavior-based method to identify compromises of high-profile accounts. Ruan et al. [30] conducted a study on onlineuser behavior by collecting and analyzing user clickstreamsof a well known OSN. Lesaege et al. [29] developed a topicmodel extending the Latent Dirichlet Allocation (LDA) toidentify the active users. Viswanath et al. [44] presenteda technique based on Principal Component Analysis (PCA)that accurately modeled the “like” behavior of normal usersin Facebook and identified significant deviations from it asanomalous behaviors. Tsikerdekis and Zeadally [54] presenteda detection method based on nonverbal behavior for identitydeception, which can be applied to many types of socialmedia. These methods above mainly concentrated on a specificdimension of the composite behavior without utilizing thecorrelations among multi-dimensional behavior data.

Vedran et al. [55] explored the complex interaction betweensocial and geospatial behavior and demonstrated that socialbehavior can be predicted with high precision. Yin et al.[36] proposed a probabilistic generative model combininguse spatiotemporal data and semantic information to predictuser’s behavior. These studies implied that composite behaviorfeatures are possibly helpful for user identification.

User identifying is a process to match the same user intwo datasets or distinguish anomalous users/behaviors. Useridentifying can be applied to a variety of tasks, such asdetecting anomalous users or match users across different datasources. Mazzawi et al. [56] presented a novel approach fordetecting malicious user activity in databases by checkinguser’s self-consistency and global-consistency. Lee and Kim[32] proposed a suspicious URL detection system for Twitterto detect users’ anomalous behaviors. Cao et al. [22] designedand implemented a malicious account detection system fordetecting both fake and compromised real user accounts. Zhouet al. [57] proposed an FRUI algorithm to match users amongmultiple OSNs. These works mainly detected the population-level anomalous behaviors which indicated strongly differencefrom other behaviors. While, they did not consider that theindividual-level coherence of users’ behavioral patterns canbe utilized to detect online identity thieves.

6 CONCLUSION AND FUTURE WORK

We investigate the feasibility in building a ladder from low-quality behavioral data to a high-quality behavioral modelfor user identification in online social networks (OSNs).By exploiting the complementary effect among OSN users’multi-dimensional behaviors, we propose a joint probabilisticgenerative model by integrating online and offline behaviors.When the designed model is applied into identity theft de-tection in OSNs, its comprehensive performance, in termsof the detection efficacy, response latency and robustness, isvalidated by extensive evaluations implemented on real-lifeOSN datasets. This study gives new insights into whetherand how modeling users’ composite behavioral patterns canimprove online identity authentication.

Our behavior-based module mainly aims at detecting iden-tity thieves after the the access control of account is broken.It is not exclusive to the traditional methods for preventingidentity theft. On the contrary, it is easy to incorporate ourmodule into traditional methods to solve identity theft problembetter, since our method is non-intrusive and continuous. Wewould like to leave the study on the combinations of prevent-ing and detecting methods from a systematic perspective asfuture work.

REFERENCES[1] J. Onaolapo, E. Mariconti, and G. Stringhini, “What happens after you

are pwnd: Understanding the use of leaked webmail credentials in thewild,” in Proc. IMC 2016, 2016, pp. 65–79.

[2] A. Mohan, “A medical domain collaborative anomaly detection frame-work for identifying medical identity theft,” in Proc. CTS 2014, 2014,pp. 428–435.

[3] Y.-A. De Montjoye, L. Radaelli, V. K. Singh et al., “Unique in theshopping mall: On the reidentifiability of credit card metadata,” Science,vol. 347, no. 6221, pp. 536–539, 2015.

[4] P. Hyman, “Cybercrime: it’s serious, but exactly how serious?” Commun.ACM, vol. 56, no. 3, pp. 18–20, 2013.

[5] J. Lynch, “Identity theft in cyberspace: Crime control methods and theireffectiveness in combating phishing attacks,” Berkeley Technology LawJournal, pp. 259–300, 2005.

[6] L. Bilge, T. Strufe, D. Balzarotti, and E. Kirda, “All your contacts arebelong to us: automated identity theft attacks on social networks,” inProc. WWW 2009, 2009, pp. 551–560.

[7] G. Newman and M. M. McNally, “Identity theft literature review,” 2005.[8] M. Egele, G. Stringhini, C. Kruegel, and G. Vigna, “Towards detecting

compromised accounts on social networks,” IEEE Trans. DependableSec. Comput., vol. 14, no. 4, pp. 447–460, 2017.

[9] K. Thomas, F. Li, A. Zand, J. Barrett, J. Ranieri, L. Invernizzi,Y. Markov, O. Comanescu, V. Eranti, A. Moscicki, D. Margolis,V. Paxson, and E. Bursztein, “Data breaches, phishing, or malware?:Understanding the risks of stolen credentials,” in Proc. CCS 2017, 2017,pp. 1421–1434.

[10] T. C. Pratt, K. Holtfreter, and M. D. Reisig, “Routine online activityand internet fraud targeting: Extending the generality of routine activitytheory,” Journal of Research in Crime and Delinquency, vol. 47, no. 3,pp. 267–296, 2010.

[11] K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song, “Design andevaluation of a real-time URL spam filtering service,” in Proc. IEEES&P 2011, 2011, pp. 447–462.

[12] A. M. Marshall and B. C. Tompsett, “Identity theft in an online world,”Computer Law & Security Review, vol. 21, no. 2, pp. 128–137, 2005.

[13] B. Schneier, “Two-factor authentication: Too little, too late.” Commun.ACM, vol. 48, no. 4, p. 136, 2005.

[14] S. Dıaz-Santiago, L. M. Rodrıguez-Henrıquez, and D. Chakraborty, “Acryptographic study of tokenization systems,” International Journal ofInformation Security, vol. 15, no. 4, pp. 413–432, 2016.

[15] M. V. Ruiz-Blondet, Z. Jin, and S. Laszlo, “CEREBRE: A novel methodfor very high accuracy event-related potential biometric identification,”IEEE Trans. Information Forensics and Security, vol. 11, no. 7, pp.1618–1629, 2016.

[16] R. D. Labati, A. Genovese, E. Munoz, V. Piuri, F. Scotti, and G. Sforza,“Biometric recognition in automated border control: A survey,” ACMComputing Surveys (CSUR), vol. 49, no. 2, p. 24, 2016.

[17] Z. Sitova, J. Sedenka, Q. Yang, G. Peng, G. Zhou, P. Gasti, and K. S.Balagani, “HMOG: new behavioral biometric features for continuousauthentication of smartphone users,” IEEE Trans. Information Forensicsand Security, vol. 11, no. 5, pp. 877–892, 2016.

[18] B. A. Rajoub and R. Zwiggelaar, “Thermal facial analysis for deceptiondetection,” IEEE Trans. Information Forensics and Security, vol. 9, no. 6,pp. 1015–1023, 2014.

[19] M. M. Waldrop, “How to hack the hackers: The human side ofcybercrime,” Nature, vol. 533, no. 7602, 2016.

[20] R. T. Mercuri, “Scoping identity theft,” Commun. ACM, vol. 49, no. 5,pp. 17–21, 2006.

[21] G. Stringhini, P. Mourlanne, G. Jacob, M. Egele, C. Kruegel, and G. Vi-gna, “EVILCOHORT: detecting communities of malicious accounts ononline services,” in Proc. USENIX Security 2015, pp. 563–578.

11

[22] Q. Cao, X. Yang, J. Yu, and C. Palow, “Uncovering large groups of activemalicious accounts in online social networks,” in Proc. ACM SIGSAC2014, 2014, pp. 477–488.

[23] G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on socialnetworks,” in Proc. ACSAC 2010, 2010, pp. 1–9.

[24] F. Ahmed and M. Abulaish, “A generic statistical approach for spamdetection in online social networks,” Computer Communications, vol. 36,no. 10, pp. 1120–1129, 2013.

[25] G. R. Milne, L. I. Labrecque, and C. Cromer, “Toward an understandingof the online consumer’s risky behavior and protection practices,”Journal of Consumer Affairs, vol. 43, no. 3, pp. 449–473, 2009.

[26] A. Abbasi, Z. Zhang, D. Zimbra, H. Chen, and J. F. Nunamaker Jr,“Detecting fake websites: the contribution of statistical learning theory,”Mis Quarterly, pp. 435–461, 2010.

[27] A. Abo-Alian, N. L. Badr, and M. F. Tolba, “Keystroke dynamics-based user authentication service for cloud computing,” Concurrencyand Computation: Practice and Experience, vol. 28, no. 9, pp. 2567–2585, 2016.

[28] M. Abouelenien, V. Perez-Rosas, R. Mihalcea, and M. Burzo, “Detect-ing deceptive behavior via integration of discriminative features frommultiple modalities,” IEEE Trans. Information Forensics and Security,vol. 12, no. 5, pp. 1042–1055, 2017.

[29] C. Lesaege, F. Schnitzler, A. Lambert, and J. Vigouroux, “Time-awareuser identification with topic models,” in Proc. IEEE ICDM 2016, 2016,pp. 997–1002.

[30] X. Ruan, Z. Wu, H. Wang, and S. Jajodia, “Profiling online socialbehaviors for compromised account detection,” IEEE Trans. InformationForensics and Security, vol. 11, no. 1, pp. 176–187, 2016.

[31] R. N. Zaeem, M. Manoharan, Y. Yang, and K. S. Barber, “Modelingand analysis of identity threat behaviors through text mining of identitytheft stories,” Computers & Security, vol. 65, pp. 50–63, 2017.

[32] S. Lee and J. Kim, “Warningbird: Detecting suspicious urls in twitterstream,” in Proc. IEEE S&P 2012, 2012.

[33] H. Li, Y. Ge, R. Hong, and H. Zhu, “Point-of-interest recommendations:Learning potential check-ins from friends,” in Proc. ACM SIGKDD2016, 2016, pp. 975–984.

[34] C. Wang, J. Zhou, and B. Yang, “From footprint to friendship: Modelinguser followership in mobile social networks from check-in data,” in Proc.ACM SIGIR 2017, 2017, pp. 825–828.

[35] J. Bao, Y. Zheng, and M. F. Mokbel, “Location-based and preference-aware recommendation using sparse geo-social networking data,” inProc. ACM SIGSPATIAL 2012, pp. 199–208.

[36] H. Yin, Z. Hu, X. Zhou, H. Wang, K. Zheng, N. Q. V. Hung, andS. W. Sadiq, “Discovering interpretable geo-social communities for userbehavior prediction,” in Proc. IEEE ICDE 2016, 2016, pp. 942–953.

[37] Y. Wang, Y. Zheng, and Y. Xue, “Travel time estimation of a path usingsparse trajectories,” in Proc. ACM SIGKDD 2014, 2014, pp. 25–34.

[38] “Yelp dataset,” http://www.yelp.com/dataset challenge, 2014.[39] M. Lichman and P. Smyth, “Modeling human location data with mix-

tures of kernel densities,” in Proc. ACM SIGKDD 2014, 2014, pp. 35–44.

[40] E. Bursztein, B. Benko, D. Margolis, T. Pietraszek, A. Archer,A. Aquino, A. Pitsillidis, and S. Savage, “Handcrafted fraud andextortion: Manual account hijacking in the wild,” in Proc. IMC 2014,2014, pp. 347–358.

[41] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,”Journal of Machine Learning Research, vol. 3, pp. 993–1022, 2003.

[42] Z. Hu, J. Yao, B. Cui, and E. P. Xing, “Community level diffusionextraction,” in Proc. ACM SIGMOD 2015, 2015, pp. 1555–1569.

[43] W. X. Zhao, J. Jiang, J. Weng, J. He, E. Lim, H. Yan, and X. Li,“Comparing twitter and traditional media using topic models,” in Proc.ECIR 2011, 2011, pp. 338–349.

[44] B. Viswanath, M. A. Bashir, M. Crovella, S. Guha, K. P. Gummadi,B. Krishnamurthy, and A. Mislove, “Towards detecting anomalous userbehavior in online social networks,” in Proc. USENIX Security 2014,2014, pp. 223–238.

[45] F. T. Liu, K. M. Ting, and Z. Zhou, “Isolation forest,” in Proc. ICDM2008, 2008, pp. 413–422.

[46] Y. Yan, L. Cao, C. Kuhlman, and E. A. Rundensteiner, “Distributed localoutlier detection in big data,” in Proc. ACM SIGKDD 2017, 2017, pp.1225–1234.

[47] J. Yin and J. Wang, “A model-based approach for text clustering withoutlier detection,” in ICDE 2016, 2016, pp. 625–636.

[48] H. He and E. A. Garcia, “Learning from imbalanced data,” IEEE Trans.Knowl. Data Eng., vol. 21, no. 9, pp. 1263–1284, 2009.

[49] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of StatisticalLearning: Data Mining, Inference, and Prediction. Springer, 2008.

[50] T. M. Cover and J. A. Thomas, “Entropy, relative entropy and mutualinformation,” Elements of information theory, vol. 2, pp. 1–55, 1991.

[51] C. Song and A. L. Barabsi, “Limits of predictability in human mobility,”Science, vol. 327, no. 5968, p. 1018, 2010.

[52] W. Youyou, M. Kosinski, and D. Stillwell, “Computer-based personalityjudgments are more accurate than those made by humans,” PNAS, vol.112, no. 4, pp. 1036–1040, 2015.

[53] F. M. Naini, J. Unnikrishnan, P. Thiran, and M. Vetterli, “Where you areis who you are: User identification by matching statistics,” IEEE Trans.Information Forensics and Security, vol. 11, no. 2, pp. 358–372, 2016.

[54] M. Tsikerdekis and S. Zeadally, “Multiple account identity deceptiondetection in social media using nonverbal behavior,” IEEE Trans.Information Forensics and Security, vol. 9, no. 8, pp. 1311–1321, 2014.

[55] V. Sekara, A. Stopczynski, and S. Lehmann, “Fundamental structuresof dynamic social networks,” PNAS, vol. 113, no. 36, pp. 9977–9982,2016.

[56] H. Mazzawi, G. Dalaly, D. Rozenblatz, L. Ein-Dor, M. Ninio, andO. Lavi, “Anomaly detection in large databases using behavioral pat-terning,” in Proc. IEEE ICDE 2017, 2017, pp. 1140–1149.

[57] X. Zhou, X. Liang, H. Zhang, and Y. Ma, “Cross-platform identificationof anonymous identical users in multiple social media networks,” IEEE

Trans. Knowl. Data Eng., vol. 28, no. 2, pp. 411–424, 2016.

composite behavioral modeling for identity theft …1 composite behavioral modeling for identity...

Documents