improving the quality of alerts and predicting intruder’s next goal with hidden colored petri-net

23
Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net q Dong Yu * , Deborah Frincke Department of Computer Science, University of Idaho, Moscow, ID 83844-1010, United States Received 3 August 2005; received in revised form 27 January 2006; accepted 19 May 2006 Available online 22 June 2006 Responsible Editor: Christos Douligeris Abstract Intrusion detection systems (IDS) often provide poor quality alerts, which are insufficient to support rapid identification of ongoing attacks or predict an intruder’s next likely goal. In this paper, we propose a novel approach to alert postpro- cessing and correlation, the Hidden Colored Petri-Net (HCPN). Different from most other alert correlation methods, our approach treats the alert correlation problem as an inference problem rather than a filter problem. Our approach assumes that the intruder’s actions are unknown to the IDS and can be inferred only from the alerts generated by the IDS sensors. HCPN can describe the relationship between different steps carried out by intruders, model observations (alerts) and tran- sitions (actions) separately, and associate each token element (system state) with a probability (or confidence). The model is an extension to Colored Petri-Net (CPN). It is so called ‘‘hidden’’ because the transitions (actions) are not directly obser- vable but can be inferred by looking through the observations (alerts). These features make HCPN especially suitable for discovering intruders’ actions from their partial observations (alerts) and predicting intruders’ next goal. Our experiments on DARPA evaluation datasets and the attack scenarios from the Grand Challenge Problem (GCP) show that HCPN has promise as a way to reducing false positives and negatives, predicting intruder’s next possible action, uncovering intruders’ intrusion strategies after the attack scenario has happened, and providing confidence scores. Ó 2006 Elsevier B.V. All rights reserved. Keywords: Intrusion detection; Alert correlation; Hidden Colored Petri-Net 1. Introduction One of the most important requirements of a good intrusion detection system (IDS) is the genera- tion of high quality alerts. Unfortunately, IDS sen- sors usually generate massive amount of alerts [1], especially if those sensors have high sensitivity to potential misuse, as can be the case with tightly tuned anomaly based sensors. In the distributed 1389-1286/$ - see front matter Ó 2006 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2006.05.008 q This is an extended and enhanced version of the work published in the International Conference on Applied Crypto- graphy and Network Security, Yellow Mountain, China, 2004 [39]. * Corresponding author. Tel.: +1 425 707 9282; fax: +1 425 706 7329. E-mail addresses: [email protected] (D. Yu), [email protected] (D. Frincke). Computer Networks 51 (2007) 632–654 www.elsevier.com/locate/comnet

Upload: dong-yu

Post on 26-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

Computer Networks 51 (2007) 632–654

www.elsevier.com/locate/comnet

Improving the quality of alerts and predicting intruder’snext goal with Hidden Colored Petri-Net q

Dong Yu *, Deborah Frincke

Department of Computer Science, University of Idaho, Moscow, ID 83844-1010, United States

Received 3 August 2005; received in revised form 27 January 2006; accepted 19 May 2006Available online 22 June 2006

Responsible Editor: Christos Douligeris

Abstract

Intrusion detection systems (IDS) often provide poor quality alerts, which are insufficient to support rapid identificationof ongoing attacks or predict an intruder’s next likely goal. In this paper, we propose a novel approach to alert postpro-cessing and correlation, the Hidden Colored Petri-Net (HCPN). Different from most other alert correlation methods, ourapproach treats the alert correlation problem as an inference problem rather than a filter problem. Our approach assumesthat the intruder’s actions are unknown to the IDS and can be inferred only from the alerts generated by the IDS sensors.HCPN can describe the relationship between different steps carried out by intruders, model observations (alerts) and tran-sitions (actions) separately, and associate each token element (system state) with a probability (or confidence). The model isan extension to Colored Petri-Net (CPN). It is so called ‘‘hidden’’ because the transitions (actions) are not directly obser-vable but can be inferred by looking through the observations (alerts). These features make HCPN especially suitable fordiscovering intruders’ actions from their partial observations (alerts) and predicting intruders’ next goal. Our experimentson DARPA evaluation datasets and the attack scenarios from the Grand Challenge Problem (GCP) show that HCPN haspromise as a way to reducing false positives and negatives, predicting intruder’s next possible action, uncovering intruders’intrusion strategies after the attack scenario has happened, and providing confidence scores.� 2006 Elsevier B.V. All rights reserved.

Keywords: Intrusion detection; Alert correlation; Hidden Colored Petri-Net

1389-1286/$ - see front matter � 2006 Elsevier B.V. All rights reserved

doi:10.1016/j.comnet.2006.05.008

q This is an extended and enhanced version of the workpublished in the International Conference on Applied Crypto-graphy and Network Security, Yellow Mountain, China, 2004[39].

* Corresponding author. Tel.: +1 425 707 9282; fax: +1 425 7067329.

E-mail addresses: [email protected] (D. Yu),[email protected] (D. Frincke).

1. Introduction

One of the most important requirements of agood intrusion detection system (IDS) is the genera-tion of high quality alerts. Unfortunately, IDS sen-sors usually generate massive amount of alerts [1],especially if those sensors have high sensitivity topotential misuse, as can be the case with tightlytuned anomaly based sensors. In the distributed

.

Page 2: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 633

case, the situation is compounded because there aremore sensors (and hence more data), and greaterchance of time delay before sensor alerts are consol-idated – with commensurate risk that some informa-tion will be inaccurate or stale. To make thingsworse, only 1% of the enormous amount of alertsgenerated by most IDS corresponds to uniqueattacks [1–3]. The remainders are false positives(i.e., alerts on non-intrusive actions), repeated warn-ings for the same attack, or alert notifications arisingfrom erroneous activity or configuration artifacts.

Many ways are available to improve the qualityof alerts. For example, we may achieve the goalby using better sensors, signatures, or analysis algo-rithms. In this paper, we focus on algorithms, andpropose a novel alert correlation approach. In otherwords, our approach aims to reduce the false posi-tives and false negatives by postprocessing (i.e., cor-relating) the alerts in a novel way: inferring anintruder’s actions with a model named Hidden Col-ored Petri-Net (HCPN). The architecture of our sys-tem is depicted in Fig. 1. Raw audit data collectedby sensors are first analyzed to generate alerts.These alerts, which contain large amounts of falsepositives, are then fed into our HCPN-based alertcorrelators to remove most of false positives andrepetitions. Confidence scores of alerts from theHCPN components installed at different sites arethen fused with our extended Dempster-Shafertheory of evidence to further improve the qualityof alerts before they are sent to the active responserto make the reaction decision.

Alert correlation [1,4–20] is used to (a) reduce thenumber of alerts that an IDS would generate to

Alerts

Confidence Fusion

Correlator

Correlator

Correlator

Correlator

Sensor

Sensor

Sensor

Sensor

Active Reactor

Final Alerts

Confidences

Fig. 1. Raw audit data collected by sensors are first analyzed togenerate alerts. These alerts, which contain a large amount offalse positives, are then fed into the alert correlators. Confidencescores provided by the alert correlators located at different sitesare fused before being sent to the active responser to make thereaction decision.

more manageable levels while still retaining strongdetection capacities, (b) improve IDS correctnessby reducing the false positives and negatives in thealerts generated by the IDS sensors, and (c) unveilan intruder’s intrusion strategy after the attack hashappened.

One way to look at the alert correlation problemis to extract ‘‘true’’ alerts (or filter out the falsealerts) from the raw alerts generated by the IDS sen-sors by utilizing relationships (e.g., similarities,sequential relationships, etc.) among alerts. This fil-ter view of alert correlation, which will be discussedfurther in Section 5, has been taken by Cuppenset al. [13,14] and Ning et al. [17–19], to name afew. Approaches based on this filter view canremove (or filter out) large percentage of false posi-tives. However, this formulation of the problemworks directly upon the alerts. It does not distin-guish between alerts and intruders’ actions in thecorrelation process, and usually does not use infor-mation such as false negative rate and false positiverate to improve the correlation results.

In this paper, we take a different view and con-sider alert correlation as the problem of inferringan intruder’s actions based on partial observations –alerts, progressively. Based on this perspective, wepropose a novel methodology for analyzing alerts.Our approach is based on a theoretical modelnamed Hidden Colored Petri-Net (HCPN). Basedon this inference view, we assume an intruder’s(either an actual intruder or an insider who is mis-using the system) actions are unknown to the IDSand can be inferred only from the enormousamount of low quality alerts generated by theIDS sensors. We demonstrate that HCPN, as anextension to Colored Petri-Net [22,33,34], candescribe the relationship between different stepscarried out by intruders, model observations(alerts) and transitions (actions) separately, andassociate each token element (system state) with aprobability (or confidence). The model is called‘‘hidden’’ because the transitions (actions) are mod-eled as hidden variables (i.e., not directly observa-ble by the analyzer) in HCPN. However, it canlook through the observations (alerts) to infer thetransitions (actions). When no training data areavailable, HCPN behaves in a similar way as otheralert correlation approaches that solely depend onthe precondition–postcondition relationship. Whentraining data are available, it can learn from thedata and further improve the alert correlationresult.

Page 3: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

634 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

We report experimental results on the DARPA2000 DDoS evaluation datasets and attack scenar-ios from the Grand Challenge Problem (GCP) pro-vided by DARPA’s Cyber Panel Program [21]. Ourexperiments1 show that HCPN can significantlyreduce false positives and negatives, effectively pre-dict intruder’s next possible action, uncover intrud-ers’ intrusion strategies after the attack scenario hashappened, and provide confidence scores.

The remainder of the paper is organized as fol-lows. In Section 2, we propose the HCPN frame-work to predicting attacks and understandingalerts. In Section 3, we discuss the basic computa-tions underneath the operation of the HCPN model.We describe the model parameter estimation andaction inference algorithms of HCPN in Section 4,describe the confidence fusion algorithm in Section5, and report experimental results in Section 6. Wediscuss related work in Section 7, and concludethe paper in Section 8.

2. Hidden Colored Petri-Net: our framework

Our alert correlator is built upon a theoreticalframework named Hidden Colored Petri-Net(HCPN). The model has been evolved through sev-eral stages. At the very beginning, we used CPN tomodel the preconditions and postconditions formisuse detection [25] since CPN allows unorderedand/or concurrent event to be encoded in the net-work structure. Later on, we applied CPN to ahigher level to improve the alert quality for scenarioattacks by using the preconditions and postcondi-tions built into the CPN. This is the time when wenoticed that what encoded in the CPN are the pre-conditions and postconditions of ACTIONS insteadof alerts. Although actions and alerts are highly cor-related they are different (i.e., an alert is just anobservation of an action). This suggests the separa-tion of the alerts and actions in the model: theactions are hidden and the alerts are observable bythe correlator. This separation of the actions andalerts evolved CPN into HCPN.

1 DARPA datasets are the best labeled datasets we have accessto, and have been used by many other researchers [17,18,37].However, we would like to warn readers on the limitations of theDARPA datasets as indicated by McHugh [42] and others wheninterpreting our results.

2.1. Definition of HCPN

Definition 1. An HCPN is an 11-tuple HCPN =(R,Q,D,A,O,G,E,P0,Z,C,N), where:

• R = {ciji = 1, . . . ,Nc} (color set) is a non-emptyfinite set of agents (normal users and/orintruders);

• Q = {qiji = 1, . . . ,Nq} (place set) is a finite set ofstates (e.g., a resource has been taken over);

• D = {diji = 1, . . . ,Nd} (transition set) is a finiteset of actions agents might take;

• A (arc set) is a finite set that A = A1 [ A2, whereA1 � (Q · D) is the set of precondition links, andA2 � (D · Q) is the set of postcondition links. Weuse I(d) to denote the set of places of which thearc hq,di 2 A1, and use O(d) to represent the setof places of which the arc hd,qi 2 A2;

• O = {oiji = 1, . . . ,No} (observation set) is a set ofobservations. It can be alerts or raw audit/trafficdata. In this paper, observations are alerts;

• G (precondition function set) is a set of precondi-tion functions associated with arcs A1, such thatG = {g : A1! SM(R)}, where SM(R) is the supersetof the multiset M(R) [22]. Precondition functionsrepresent the conditions to be met before anaction can be conducted by the agents;

• E (postcondition function set) is a set of postcon-dition functions associated with arcs A2, suchthat E = {e : A2! SM(R)}. Postcondition func-tions represent the agent–resource ownershipchange due to an action;

• P0 (initial marking distribution) is the initial (ordefault) agent–resource ownership probabilitydistribution P0 = P0(Q,SM(R)) = {p : (Q,SM(R))![0, 1]};

• Z (transition probability) is the probability thatan action d will be conducted given that the pre-conditions of the transition are satisfied (or, inother words, if the transition d is enabled):

D ¼ fP ðd 2 D will fire nextjd is enabledÞg¼ fz : D! ½0; 1�g:

Note that, if the preconditions are not satisfied,the probability that an action will be taken is 0;

• C (observation probability) is the probability thatO is observed given action D and is defined asC = P(OjD) = {c:(D,O)! [0, 1]};

• N (tolerance) is the tolerance function used todetermine whether two states are indistingui-shable.

Page 4: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 635

HCPN is an extension to the Colored Petri-Net(CPN) first introduced by Jensen [23] and henceinherits many concepts and notations from CPN.Besides the key elements inherited from CPN (e.g.,places, tokens, colors, and transitions), HCPNintroduces an additional element called observa-tions. The advantage of HCPN as compared toCPN is the ability to provide probabilities, and tomodel transitions and the observations separately.

Fig. 2 illustrates how HCPN can be used in thealert correlation task: colors represent agents; placesrepresent resources; observations represent alerts;transitions represent actions; input arcs representpreconditions of the action; and output arcs repre-sent postconditions of the action. Note that whenan action is known to have been taken (from otherinformation), we can deduce that the preconditionsof the action are satisfied. In other words, thereshould be an output arc from the action to each ofthe input places in the HCPN to indicate this effect.For this reason, there is always a matched implicitoutput arc for each input arc in HCPN. With theseimplicit arcs, HCPN has the potential to recoverthe false negatives (i.e., those missing alerts).

In HCPN, a token element (q,c) stands for thefact that the agent c has access to resource q. Anaction is enabled if the preconditions of thecorresponding action are satisfied. The marking Mrepresents the agent–resource ownership. The mark-

ing distribution P = P(Q,SM(R)) = {p : (Q,SM(R))![0,1]} represents the agent–resource ownershipprobability. pt(q,c) represents the probability thatat time t resource q has been taken over by agentc. The transition firing probability D represents theprobability that each action will be taken next.dt(d,c) represents the probability that at time t theaction d will be taken next by agent c. The progress

of intrusion is represented by the change of marking

Fig. 2. Hidden Colored Petri-Net. In HCPN, colors representagents; places represent resources; observations represent alerts;transitions represent actions; input arcs represent preconditionsof the action; and output arcs represent postconditions of theaction.

distribution along time. The input of HCPN is thealerts, and the output of HCPN is the resourcescompromised by intruders. If the change of markingdistribution between the current state and the initialstate exceeds the tolerance, it is considered that aremarkable progress has been made by the agent.

If the token element (q,c) is in a marking M

(which is a multiset), agent c has access to (or owns)resource q. Multiple token elements (q,c) would notaffect the agent–resource ownership. For this rea-son, we need to consider only the probability thatthe multiset M is greater than (or contains) the mul-tiset {(q,c)}, i.e., {(q,c)} 6M [22] and do not distin-guish between one single token element and multipleones. All precondition functions used in this paperthus have the same form: {(q,c)} 6M, and the post-conditions of any action d have the same form:M ¼ M þ

Pq2OðdÞfðq; cÞg.

2.2. Examination of HCPN with examples

We use two examples to describe how theHCPN-based approach works. The first example isthe classic local-to-root (L2R) attack scenario from[24,25], and the second example is the remote-to-local (R2L) and L2R attack scenario from[26]. These examples involve several steps and aredescribed here just to show how HCPN works. Areal world attack scenario may be far more compli-cated than the examples shown here.

The L2R attack scenario from [24,25] involvesfour actions: copy, chmod, touch, and mail. Eachaction would grant the intruder access to oneresource. Fig. 3 depicts the HCPN model of thisL2R attack. There are five named transitions in thegraph: one for each action: copy, chmod, touch,and mail; and a special extra transition named ‘‘nor-mal’’ whose preconditions are always satisfied andthere is no postcondition to model the un-intrusiveactions. (Note that our approach is to identifyattacks by associating each alert with an action.) Ifan alert is inferred to be associated with the ‘‘nor-mal’’ action, the alert is considered as a false posi-tive. Six named places are used in the figure torepresent resources (or attack states) involved. Theplace q1 is used to model the resource accessible toall agents. Arcs describe the preconditions and post-conditions of actions. For example, an intruderneeds to hold both q4 and q5 to be able to conductthe mail action. After the mail command is issued,the intruder would be able to hold q6. Each actionmight be identified by the IDS sensors as different

Page 5: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

q3

normal

q4 q6

copy

q1

q2

q5touch

chmodmail

Alert 1 Alert 2

Alert N

Other resources Other actions

Fig. 3. An example HCPN model for an L2R attack. There arefive transitions in the graph: transitions used to model the actionscopy, chmod, touch, and mail, and the special ‘‘normal’’transition used to model the un-intrusive actions. Six places areused in the figure to represent resources involved. The place q1 isa special place to model the resource that would be accessible toall users. Arcs in the figure describe the preconditions andpostconditions of actions. Each action may be observed asdifferent alerts with different probabilities. Note that resourcesand actions related to other attacks are also part of the wholeHCPN.

636 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

alerts with different probabilities. These alerts (Alert1 to Alert N), which might be reported by sensors atthe host level or network level, are observations inthe HCPN model. For example, the normal actionmight trigger different alerts in IDS, whose falsepositive rate is high. Note that, we only name theplaces, transitions, and arcs related to this L2Rattack scenario in Fig. 3. The complete system wouldinclude additional elements (indicated as otherresources and other actions in Fig. 3) associated withother attack scenarios. This specific L2R attackwould then be only one path in the complete system.

The second example is the attack scenario from[26]. In this example, there are two hosts in a LAN.Host 2 has a database installed. Ftp service is run-ning on both host 1 and 2, and ssh service is runningon host 1. The intruder’s goal is to compromise theinformation in the database stored on host 2.

Four actions (sometimes called atomic attacks)are available to the intruder in this scenario. Thefirst action is a remote-to-root attack named SshdBuffer Overflow (SBO). This attack immediatelygives a remote user a root shell on the vulnerablemachine (host 1 in our example). The preconditionof this action is that the intruder has user-level priv-ilege at the source host, and the target host is run-ning a vulnerable version of sshd. The secondaction, named ftp-rhost (ftpR) attack, exploits anftp vulnerability that allows an intruder to create a

.rhosts file in the ftp home directory which estab-lishes a remote login trust relationship between thesource host and the target host. The preconditionof this action is that the intruder has user-level priv-ilege on the source host, and the target host is run-ning a vulnerable ftp service. The third action iscalled remote login (rlogin) attack. In this attack,the intruder logs into the target host using an exist-ing remote login trust relationship between twohosts. The fourth action is a local buffer overflow(LBO) attack. In this attack, the intruder alreadyhas the user-level privilege of the target host andgains the root privilege by exploiting the local setuidbuffer overflow vulnerability.

An HCPN model for this attack scenario wouldinclude a normal action transition and a transitionfor each action described above. Note that sincethe precondition functions in the HCPN can takeparameters, we do not need to have a separate tran-sition and/or place for different hosts. The host(identified by the IP address) can be a parameterin the model conditions.

The intruder can have many strategies to com-promise the database on the host 2. An examplestrategy is to use SBO attack to get the root privi-lege of host 1, use the ftpR attack to generate thetrust relationship between host 1 and 2, use thelogin attack to gain the user-level privilege on host2, and then launch the LBO attack to gain the rootprivilege.

It is obvious that the remote login is not con-ducted by an attacker most of the time. For this rea-son, you would see many warnings on non-intrusiveremote login if an alarm is issued for each remotelogin. Using HCPN, most remote login alerts wouldbe automatically associated with the special normalaction transition, so no alarm would be issued forthose remote login requests. However, HCPNwould likely issue alarms if the previous steps havebeen conducted by the intruder. It may also issuean alarm if the next steps are perceived to have beenconducted with high confidence.

HCPN behaves differently from the filter-basedalert correlation approaches. Here we describe thesedifferences with the L2R example we just described.To make discussion easier, we use alert_actionNameto represent the alert related to the action namedactionName.

First, instead of assuming a one-to-one mappingbetween alerts and actions in the filter-based alertcorrelation approaches, the HCPN models assumethat the sensors may observe each action as different

Page 6: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 637

alerts with different probabilities (named observa-tion probabilities). These probabilities can bededuced from the false positive rate and false nega-tive rate of each action, which will be discussed inSection 4. For example, the copy action might becorrectly observed by IDS as alert_copy, missedby IDS (i.e., the action is observed as a normalbehavior), or incorrectly identified as alert_touchwith some low probability.

By using the observation probability, HCPN candetermine with high accuracy whether the touchaction really happened when an alert_touch wasissued. Obviously, three possible conditions existwhen the alert_touch was observed. The first possi-bility is that the touch action really happened. Thesecond possibility is that alert_touch is just a falsepositive and is corresponding to a normal action.The third possibility is that a different action (suchas the copy action) has happened since the sensorshave small probability to mistakenly observe thecopy action as touch and issue the alert_touch.HCPN would choose the best explanation basedon all the information available at the current stage.Similarly, HCPN would not always think that theagent did not have access to the resource q3 justbecause alert_copy was not observed since the IDSsensors might miss the copy action (i.e., a false neg-ative). The (implicit) output arcs from the actionchmod to the resource q3 would allow HCPN toinfer that q3 was accessible by the agent if thechmod action is believed to have been conductedfrom other information.

In brief, the correlation results in HCPN aredetermined by the alerts, the preconditions andpostconditions of actions, the observation probabil-ities, the transition probabilities, and the number ofeach kind of alert. We can informally show that theinformation used in HCPN is a superset of featuresused by filter-based approaches and thus HCPN canpotentially provide equivalent or better correlationresults.

We now show that filter-based alert correlationapproaches can be represented as special construc-tions of HCPN. For example, we can build anHCPN model for alert correlation approachesbased ONLY on the preconditions and postcondi-tions of malicious actions (a real filter-based systemmight be more complicated). To convert such a sys-tem to an HCPN-based system, we first construct aCPN (without the observation layer) that describesthe same preconditions and postconditions of mali-cious actions with input and output arcs respec-

tively. We then add the observation layer to makeit an HCPN and set the observation probabilities to

P ðoijdjÞ ¼1 i ¼ j;

0 otherwise;

�ð1Þ

to indicate that alerts and actions are treated exactlythe same (as in those systems). We further set thetransition probabilities (i.e., the probability that anaction will be taken given that all its preconditionsare satisfied) to 1. With these settings, the HCPN be-haves the same as the original filter-based model:For each new alert o, only the corresponding actiond (with which P(ojd) = 1) is likely to be taken sincefor any other transition d 0, P(ojd 0) = 0. If the precon-ditions are not satisfied, the probability that the ac-tion has been taken is 0 (which is the same as tosay that the alert is false if the preconditions arenot satisfied). If the preconditions are satisfied, theprobability that the action is taken is 1 (which isthe same as to say that the alert is true if the precon-ditions are satisfied). If the alert is considered to betrue, the action’s postconditions are set to be true.

Second, HCPN presents the compromisedresources instead of alerts of actions to administra-tors and/or active reactors. Because the number ofcompromised resources is usually smaller than thenumber of actions, this can effectively reduce theamount of data passed to the administrators andactive reactors. Presenting the resources compro-mised by the intruders can also help administratorsand active reactors to make better decisions on whatto do to minimize further damage.

Third, HCPN not only presents the compromisedresources but also indicates the probability that aspecified resource has been compromised by a spe-cific intruder.

3. Basic algorithms

The initial state of HCPN is determined by theinitial probabilities. After that, HCPN reevaluatesits state (i.e., the marking distribution Pt) afterreceiving each alert from the sensors. To reevaluatethe state, HCPN needs to first determine the proba-bilities that each action is enabled (i.e., the precon-ditions of the action are satisfied). Given theseprobabilities, HCPN can determine which action ismost likely to be taken next and thus whichresources are in danger. It can also determine whichaction is most likely to have happened given the newalert, and thus which resources have been takenover by the agent. To make the equations easier to

Page 7: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

transition

places

Fig. 4. An action is enabled if and only if all preconditions aresatisfied.

place

transitions

Fig. 5. A resource q will be compromised by agent c if and only ifit is compromised by c already or at least one of the action is to betaken next.

638 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

read, agent c is always assumed and is removedfrom all the equations from now on.

Computation 1. In HCPN, an action is enabled ifand only if all preconditions are satisfied (Fig. 4). Inother words, given a marking distribution Pt, theprobability that an action d 2 D is enabled withrespect to agent c can be estimated using Eq. (2).Note that by definition, the relationship between allpreconditions is conjunctive. If a disjunctive rela-tionship is needed, a duplicate action (possibly witha different ID and is treated as a different action)needs to be represented in the HCPN. Computation1 is used to determine how likely each of the actionsare ready (all preconditions are satisfied) to be takenby an agent at time t.

PðEðdÞjPtÞ ¼ Pðd is enabledjPtÞ

¼ P ^q2IðdÞðPtðqÞP Gða ¼ ðq; dÞÞÞ

� ��Y

q2IðdÞP ðPtðqÞP Gða ¼ ðq; dÞÞÞ

¼Y

q2IðdÞptðqÞ: ð2Þ

Computation 2. Given a marking distribution Pt, theprobability that an action d 2 D will be taken nextwithout knowing the alert can be determined byEqs. (3) and (4). At any given time, multiple actionsmight be enabled (with different probabilities). Thiscomputation predicts the likelihood that each actionwill be taken next by the agent before receiving anynew alert. As you can see, this computation dependson Computation 1 to calculate P(E(d)jPt).

d0tðdÞ ¼ P ðD ¼ djPtÞ¼ P ðd will fire nextjEðdÞÞP ðEðdÞjPtÞ ð3Þ

¼ zðdÞY

q2IðdÞptðqÞ;

dtðdÞ ¼d0tðdÞPd 0d0tðd0Þ : ð4Þ

Computation 3. Given the state St = (Pt,Dt) andwithout knowing the alert, a resource q will be com-promised by agent c if and only if it is already com-promised by c or at least one of the actions (whosepostcondition is to compromise resource q) is to betaken (Fig. 5). If we assume that the probability thateach action is taken is independent with each other,the probability that resource q will be compromisedby agent c can be calculated with Eq (5). Thiscomputation is used to predict which resources arelikely to be compromised next without receivingthe next alert. From (5), we can see that Com-putation 3 depends on Computation 2 to calculatedt(d).

P ðfðq; cÞg 6 MtjSt�1Þ� 1� ð1� pt�1ðqÞÞ �

Yq2OðdÞ

ð1� dtðdÞÞ: ð5Þ

Computation 4. Given a state St�1 = (Pt�1,Dt�1)and an alert Ot, the probability that the action d istaken is denoted as P(Dt = djSt�1,Ot) and can bedetermined as

P ðDt ¼ djSt�1;OtÞ ¼PðDt ¼ d;OtjSt�1Þ

P ðOtjSt�1Þ

¼ P ðDt ¼ djSt�1ÞP ðOtjDt ¼ d; St�1ÞP ðOtjSt�1Þ

: ð6Þ

Since P(Dt = djSt�1) = dt�1(d), and Ot only dependson Dt, it becomes:

¼ dt�1ðdÞP ðOtjDt ¼ dÞP ðOtjSt�1Þ

¼ dt�1ðdÞctðOtjdÞPd 02Ddt�1ðd 0ÞctðOtjd 0Þ

¼ dt�1ðdÞcðOtjdÞPd 02Ddt�1ðd 0ÞcðOtjd 0Þ

:

ð7Þ

Page 8: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 639

This computation is used to estimate the likeli-hood that an action has just been taken by an agentgiven the current state and the newly received alert.

Computation 5. Given a state St�1 = (Pt�1,Dt�1),an alert Ot, and action d is guessed to be taken,the probability of the next state is denoted asP(PtjSt�1,Ot,Dt = d) and can be determined as in(8) and (9). This computation is used to estimatethe likelihood that each resource will be taken byan agent after receiving a new alert.

P ðfðq 2 OðdÞ; cÞg 6 MtjSt�1;Ot;Dt ¼ dÞ � 1� ð1� pt�1ðqÞÞ

� 1� dt�1ðdÞcðOtjdÞPd 02Ddt�1ðd 0ÞcðOtjd 0Þ

Yq02IðdÞ

pt�1ðq0Þ !

ð8Þ

P ðfðq 62 OðdÞ; cÞg 6 MtjSt�1;Ot;Dt ¼ dÞ ¼ pt�1ðq 62 OðdÞÞð9Þ

We have just described five basic computationsused in HCPN. The HCPN starts with an initialstate determined by the initial probabilities. At eachstate, the HCPN estimates the likelihood that eachaction’s preconditions have been satisfied (i.e., theaction is enabled) using Computation 1. It thenevaluates the probability that each action will betaken by the agent next. In other words, it predictsthe agent’s next possible action using Computation2. The HCPN can also estimate which resources arelikely to be taken over by the agent using Computa-tion 3. After receiving a new alert, the HCPNreevaluates the state by first calculating the proba-bility that each action has just been taken usingComputation 4, and then updating the state (i.e.,probability that each resource has been taken overby an agent) using Computation 5.

Note that these computations are specific to ourHCPN model to derive the information we need.The basic ideas behind these computations, how-ever, have been applied to many areas.

4. Parameter estimation and action inference

To infer agents’ actions with HCPN, we need toanswer two questions: how to determine the model’sparameters based on labeled datasets, and how toinfer an agent’s most probable action sequencegiven the model and the alerts. These questionsare usually referred to as parameter estimationproblem and action inference problem. In this sec-tion, we describe the algorithms to these problems.These algorithms serve as the basis of our approachto alert correlation.

4.1. Parameter estimation

The purpose of parameter estimation is to findthe set of model parameters that best explains theknown alerts O1,O2, � � � ,Ot and associated actionsgiven the model structure. Specific to our model,we need to estimate the observation probabilitiesand transition probabilities. These probabilities areestimated from all data available (not specific to aparticular agent) so that probabilities estimatedfrom past incidences can be used to detect attacksand intruders in the future. Note that, during thealert correlation process, the alerts are separatedinto different streams based on the agent identity.

The observation probabilities can be directly esti-mated with Maximum Likelihood (ML) principle[41]. The likelihood of alerts given the observationprobability c(ojd) is defined as

Lðc; dÞ ¼X

i

lnðP ðOijc; dÞÞ

¼XOi¼o

ln cþXOi 6¼o

lnð1� cÞ

¼ N ln cþ L lnð1� cÞ; ð10Þ

where N is the number of instances that O is issuedby IDS sensors when action d is taken and L is thenumber of instances that O is NOT issued by IDSsensors when action d is taken. The best estima-tion of the observation probability is the one thatmaximizes the above likelihood as shown in Eq.(11):

oLðc; dÞoc

¼ Nc� L

1� c¼ 0

) ðN þ LÞc ¼ N

) c ¼ NðN þ LÞ : ð11Þ

This estimation is intuitive. It says that the obser-vation probability equals to the relative frequencythat a specific alert is issued by the IDS sensorswhen the action is taken. Note that the observationprobability is a property of the IDS sensors. It indi-cates how likely an action would cause the IDS toraise a specific alert. For example, the observationprobability p(Oij normal action) is the probabilitythat a normal action causes the IDS to raise thealert Oi (note that it is different from the probabilityp(Oij traffic data, normal action) in which casethe traffic data value is also known), andPNo

i¼1pðOij normal actionÞ is the false positive rate.Similarly, p(no alertjDi) is the false negative rate

Page 9: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

640 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

given the action Di. Since false positive and falsenegative rates are encoded in the HCPN, they areused in the inference process to correlate alerts.The observation probability is hardly affected bythe intruders’ behavior given that it is an intrinsicproperty of the IDS sensors.

The estimation of the transition probabilities is alittle bit trickier. In this paper, we strived to solvethis problem using an Expectation Maximum(EM) based algorithm [27]. The EM algorithm con-sists of two major steps: an expectation step (E-Step), followed by a maximization step (M-Step).In the E-Step, the unobserved data (actions inHCPN) is estimated based on the current modelparameters kk. In the M-Step, Maximum Likeli-hood (ML) estimation is used to estimate modelparameters kk+1using estimated data. This processis iterated until the parameters are fixed.

The likelihood of action taking given the transi-tion probability d = d(d) can be defined as

LðdÞ ¼X

i

lnðPðDijd;EiðdÞÞÞ

¼XDi¼d

lnðd � EiðdÞÞ þXDi 6¼d

lnð1� d � EiðdÞÞ;

ð12Þ

where Ei(d) is the probability that action d is en-abled at step i. The transition probability is chosento maximize the above likelihood.

oLðdÞod¼XDi¼d

1

d�XDi 6¼d

EiðdÞ1� d � EiðdÞ

¼ 0

) d ¼ NPDi 6¼d

EiðdÞ1�d�EiðdÞ

;

ð13Þ

where N is the number of times that action d is taken(or jdj). Since there is no close form solution for d in(13). The transition probability is calculated itera-tively with the initial d = d(d) set to the value fromthe last EM step as indicated in the following steps:

1. Estimate Ei(d) for each action d at each step i

using Computation 1 in Section 3 based on thecurrent model parameters.

2. Fix Ei(d) and re-estimate d iteratively using (14).

dkþ1 ¼NP

Di 6¼dEiðdÞ

1�dk�EiðdÞ

: ð14Þ

3. Exit if the change of transition probability issmall enough, or go to step 1 otherwise.

Note that, the transition probability is the condi-tional probability that a transition will be taken ifthe preconditions of transition are satisfied, and isrelatively consistent. It is different from the proba-bility that the transition will be taken next giventhe current state, which changes case by case (andcan be estimated using Computation 3 in Section3). The transition probability has also served asthe key element of other probabilistic alert correla-tion frameworks such as the one proposed by Zhaiet al. [40].

Unlike anomaly based sensors, HCPN does notadapt the transition probabilities completely unsu-pervised, and so it is not easy for an intruder to mis-lead the system. An extremely skilled intruder,however, may still behave differently from the tran-sition probabilities learned from normal intrudersand cheat the system to some extent.

4.2. Probability smoothing

The well-known law of large numbers assures usthat the relative frequency is a good estimate of theprobability (i.e. (11)) if the number of counts is verylarge. However, estimating the probability with (11)has two problems. First, the total number of occur-rences of each action is usually not large enough inthe training data, so the theory of large numberdoes not hold. Second, some events may rarely hap-pen, so it is seen only once or 0 times in the trainingdata. In other words, not seeing an event in thetraining data does not mean the probability of it is0. For the same reason, the probability of an eventthat happens only once is likely to be overestimated.

The question is what probability we shouldassign to events that occurs 0 or 1 time and whetherthe probability of an event that happens n timesshould really be n/m times as large as the probabilityof an event that happens m times.

Our answer to the above question is based on theGood-Turing estimate [28], which can be derived bya variety of methods. The Good-Turing estimatestates that the probability assigned to all the unob-served events is equal to the total probability thatwould have been assigned to singleton events by arelative frequency formula, i.e.,

~p0n0 ¼n1

N; ð15Þ

where ~p0 is the probability assigned to unobservedevents, n0 is the total number of events that donot happen, n1 is the total number of singleton

Page 10: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 641

events, and N is the total number of event occur-rences. The smoothed estimation of the probabilitiesis thus:

~pj ¼

njþ1

nj

jþ 1

Nj ¼ 0; 1; . . . ;M ;

ajN

j > M ;

8>><>>: ð16Þ

where ~pj and nj are the estimated probability and thenumber of events observed j times respectively. Thenormalization factor a is equal to

a ¼P

j>Mþ1jnjPj>M jnj

¼ N �PMþ1

i¼1 ini

N �PM

i¼1ini

; ð17Þ

M is usually a small to moderate integer. In ourcase, we choose M to be 4.

Probability smoothing is very important to reli-ably estimating the probabilities, especially whenthe training data size is small. Without proper prob-ability smoothing, some probabilities will be 0 andmay lead to bad inference result.

4.3. Action inference

The purpose of action inference is to find themost probable action sequences taken by the agentgiven the alerts. In other words, the inference prob-lem, or the correlation process in our model, can bestated as follows: given alerts O = O1,O2, � � � ,Ot,and the model parameter k, which action sequenceD = D1,D2, � � � ,Dt is most likely to have led to thesequence of alert O from k? That is to say, we wantto optimize the criteria indicated in (18):

arg maxD½P ðDjO; kÞ� ¼ arg max

D

P ðOjD; kÞPðDjkÞP ðOjkÞ

� �:

ð18Þ

Since the term P(Ojk) is not related to D, we can dis-card it when selecting the best path, i.e., we need toonly optimize:

arg maxD½P ðOjD; kÞP ðDjkÞ� ¼ arg max

DP ðO;DjkÞ:

ð19Þ

We solve this problem with dynamic program-ming by defining xt(j) as the maximum score of alength t state sequence ending in action j and pro-ducing the first t alerts from O, as shown in (20):

xtðjÞ ¼ maxD1���Dt�1

PðO1; . . . ;Ot;D1; . . . ;Dt�1;Dt ¼ jjkÞ:

ð20Þ

Thus, the overall maximum probability is

maxj

xT ðjÞ ¼ maxj

maxD1���DT�1

ðO1; . . . ;OT ;D1; . . . ;DT�1;DT ¼ jjkÞ¼ max

DP ðO;DjkÞ ð21Þ

and

xtðjÞ ¼ maxD1���Dt�1

PðO1; . . . ;Ot;D1; . . . ;Dt�1;Dt ¼ jjkÞ

¼ maxD1���Dt�1

PðO1; . . . Ot�1;D1; . . . ;Dt�1jkÞ

PðDt ¼ jjSt�1;kÞPðOtjDt ¼ j; kÞ¼ max

Dt�1¼i½ maxD1���Dt�2

PðO1; . . . ;Ot�1;D1; . . . ;Dt�1 ¼ ijkÞ

PðDt ¼ jjSt�1;kÞPðOtjDt ¼ j; kÞ�

� maxDt�1¼i½xt�1ðiÞdi

t�1ðjÞcðOtjjÞ�;

where dit�1ðjÞ � P ðDt ¼ jjS0t�1; kÞ and S0t�1 is the state

corresponding to xt�1(i). From this inference pro-cess, we can see why HCPN has potential to reducefalse positive and false negative rates: given the alertsequence, HCPN can infer the most possible actionsequence using the precondition–postconditionrelationship, the observation probability, and thetransition probability. For example, if the best ac-tion sequence deduced by HCPN for the alertsequence:

alert attack1; alert attack1; alert attack2;

alert attack5; alert attack3; alert attack5 is

attack1; attack1; attack2; normal; attack3; normal:

HCPN would indicate that attack1, attack2, and at-tack3 have all been taken by the agent. Howeveralert_attack5s are considered false positive alertssince they are associated with the normal action.

In the inference algorithm, we calculate xt(j) foreach alert Ot and each action j. Note that we need toselect the maximum value from all Nd (the numberof actions) values of ½xt�1ðiÞdi

t�1ðjÞcðOtjjÞ� when cal-culating xt(j). In other words, the time complexityof the action inference is OðN 2

d � NoÞ, where No isthe total number of alerts issued by the IDS sensors.The efficiency of the algorithm can be furtherimproved considering that most of the observationprobabilities c(Otjj) are close to 0, in which casewe can simply assign a fixed floor probability forthe related xt(j). Since each action is usually misrec-ognized as less than 10 (independent of Nd) otheractions (i.e., the number of significant c(Otjj) foreach j is less than 10), the time complexity of the

Page 11: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

642 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

algorithm after the simplification can be reduced toapproximately O(Nd Æ No), which is at least as goodas any other approaches surveyed in Section 7.1.Note that the time complexity is not related to thenumber of attack steps.

5. Alert confidence fusion

The false positive and false negative rates canbe further reduced by alert confidence fusion tech-nologies (confidence fusion component in Fig. 1).The way to combining the confidence scores, how-ever, is tricky for several reasons. First, alerts pro-vided by different sensors should not be trustedequally. For example, information provided byremote sensors and analyzers is considered lesstrustworthy than that provided by local sensorsand analyzers as noted earlier [29]. Second, sensors(and analyzers) installed at different locations mayhave different detection capabilities even if the sen-sors are of the same type since the raw events cap-tured by these sensors are different. For example, asensor installed in a LAN usually receives lessexternal traffic but more internal traffic than a sen-sor installed at the DMZ zone. Third, differentkinds of sensors and analyzers may detect thesame type of attack with a different level of accu-racy. For example, a sensor may detect a misuseactivities such as Distributed Denial of Service(DDoS) attacks with 90% accuracy but detectescalation of privilege, such as (Local-to-Root)L2R attacks with only 50% accuracy. Another,perhaps host-based, sensor may not detect DDoSat all, but be quite accurate in detecting escalationof privilege.

To take into consideration all these factors, wehave extended the Dempster–Shafer’s (D–S) Theoryof Confidence [30] into Exponentially WeightedDempster–Shafer’s (EWDS) Theory of Confidence[31] to fuse the confidence scores from different alertanalyzers. In the basic D–S theory, the confidencecombination rule is

m12ðHÞ ¼P

B\C¼H m1ðBÞm2ðCÞPB\C 6¼/m1ðBÞm2ðCÞ

; ð22Þ

where B, C, and H are hypotheses. mi(H) is the con-fidence that observer Oi believes that hypothesis H

is true and m12(H) is the combined confidence.The EWDS theory incorporates a weighted view

of evidence from different sources with the followingmodified combining rule:

m12ðHÞ ¼P

B\C¼H ½m1ðBÞ�w1 ½m2ðCÞ�w2PB\C 6¼/½m1ðBÞ�w1 ½m2ðCÞ�w2

; ð23Þ

where wi is the weight for observer Oi. Whenw1 = w2 = 1, (23) reduces to the basic D–S combin-ing rule.

Since we are interested in whether an alert Ak istrue (a true positive) or false (a false positive), theframe of discernment [30] is Hk = {Ak,�Ak}. Thepossible hypotheses are

2Hk ¼ f/; fAkg; f:Akg; fAk;:Akgg: ð24ÞIt is clear that

miðfAk;:AkgÞ ¼ 0; ð25Þmiðf:AkgÞ ¼ 1� miðfAkgÞ: ð26Þ

The combining rule then becomes:

m12ðfAkgÞ ¼P ðfAkgÞ

P ðfAkgÞ þ Pðf:AkgÞ; ð27Þ

m12ðf:AkgÞ ¼P ðf:AkgÞ

P ðfAkgÞ þ P ðf:AkgÞ; ð28Þ

where

P ðX Þ ¼ ½m1ðX Þ�w1;k ½m2ðX Þ�w2;k ð29ÞIn our system, we use the transition probability esti-mated for the action Ak by the ith HCPN alert cor-relator as the confidence values mi({Ak}).

The weights can be estimated by minimizing theMean Square Error (MSE) defined as

MSE ¼PN

j¼1ðmc;jðfAkgÞ � rjÞ2

N; ð30Þ

where N is the total number of samples in the train-ing set, mc,j is the jth combined observation score,and rj is the real (or true) value of the jth obser-vation.

rj ¼1 if Ak is true;

0 otherwise:

�ð31Þ

We used the standard gradient descending algo-rithm to search for the best weights, i.e.,

wtþ1i;k ¼ wt

i;k � koMSE

owi;k; ð32Þ

where k is the step size. Note that for each alert typeAk and each observer i, there is a weight wi,k. Inother words, if observer one is good at detectingalert Ak but is not good at detecting alert Aj, w1,k

is large but w1,j is small.There are two practical considerations in our

implementation.

Page 12: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 643

First, when mi(H) is close to 0, log(mi(H)) mightbe very small. This may result in computationalunderflow. To prevent underflow from happening,all the probabilities in the system have a floor valueof 0.01. If the confidence of a hypothesis is less than0.01, we set it to be 0.01.

Second, an HCPN alert correlator reports to thealert confidence fusion component only if the per-ceived state of a resource has been changed, so alertcorrelators send the state change information asyn-chronously. To deal with this problem, we store everycorrelator’s last reported evidence. When a newreport regarding alert Ak comes in, we update the cor-responding evidence and re-estimate the combinedconfidence for the alert Ak. We thus gauge confidencebased on evidence ‘‘in hand,’’ and increase/reduceconfidence when new information arrives. Note that,the fusion process produces derived informationfrom multiple sources. However, it does not damageinformation stored at each source locations.

6. Experiments and evaluation

We have developed an off-line alert correlationsystem based on our HCPN framework and per-formed two sets of experiments: one on the DARPA2000 DDoS evaluation datasets [32], and one on theattack scenarios from the Grand Challenge Problem(GCP) provided by DARPA’s Cyber Panel Program[21]. The HCPN model used in our experiments con-sists of 23 places (resources), 34 transitions (33 mali-cious actions + 1 normal action), and 33 alerts(corresponding to malicious actions) for bothexperiments.

6.1. Quality of alerts

Quality of alerts is usually measured with thetotal number of alerts (TNA), the detection rate(DR), the false positive rate (FPR), the false nega-tive rate (FNR), and the repeated true alert rate(RTR):

DR ¼ Number of True Attacks Reported

Number of Total Observable Attacks; ð33Þ

FPR ¼ 1�Number of True Alerts Reported

Number of Alerts Reported; ð34Þ

FNR ¼ 1� DR; ð35ÞRTR ¼ Number of True Alerts�Number of True Attacks

Number of True Attacks:

ð36ÞThe reason above measures are reasonable is thatfor each true intrusive action taken by an agent,

we want to produce only one alert. However,though one cannot reduce both the false positivesand the false negatives at the same time, these qual-ities are not necessarily functions of one anothereither. It is thus difficult to compare different ap-proaches based on one measure alone. In this paper,we define quality of alerts as

Definition 2. The Quality of Alerts (QoA) is definedas

QoA ¼ 100%� fpþ fnþ rt

ta; ð37Þ

where fp is the number of false positives, fn is thenumber of false negatives, rt is the number of re-peated true alerts, and ta is the number of intrusiveactions (or expected alerts). Note that the QoAvalue can be negative (which means the number ofincorrect and/or redundant alerts exceeds the num-ber of true attacks) and the best QoA value is 100%.QoA is a combined score to measure the alert qual-ity. It is borrowed from the speech recognition com-munity where a speech recognizer may makeinsertion errors (similar to false positives and re-peated true alerts in IDS), and deletion errors (sim-ilar to false negatives in IDS).

Sometimes, the cost incurred by different types ofalerts is different. For example, the cost incurred bythe repeated true alerts is often less than thatbrought by the false positives, which, in turn, isusually less than that from the false negatives. Toincorporate the cost information, we introduce theconcept of Weighted Quality of Alerts (WQoA):

Definition 3. The Weighted Quality of Alerts(WQoA) is defined as

WQoA ¼ 100%� wfp � fpþ wfn � fnþ wrt � rt

ta; ð38Þ

where wfp P 0 is the weight for the false positives,wfn P 0 is the weight for the false negatives, andwrt P 0 is the weight for the repeated true alerts.WQoA reduces to QoA when wfp = wfn = wrt = 1.

Note that the weights in the WQoA are usuallydifferent in different systems. For this reason, we useQoA instead of WQoA in our experiments.

6.2. Experiments on the DARPA DDoS

evaluation dataset

Our first set of experiments was conducted on theDARPA 2000 intrusion detection evaluation data-

Page 13: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

Table 2Correlation result for the inside network traffic

Host State Probability

172.016.112.010 SystemCompromised 1.00VulnerableSadmind 0.66DaemonInstalled 0.55ReadyToLaunchDDOSAttack 0.95

172.016.112.050 SystemCompromised 1.00VulnerableSadmind 0.66DaemonInstalled 0.55ReadyToLaunchDDOSAttack 0.95

172.016.115.020 SystemCompromised 1.00VulnerableSadmind 0.66DaemonInstalled 0.80

131.084.001.031 DDoSHappened 0.90

Table 3Correlation result for the DMZ traffic

Host State Probability

172.016.112.010 SystemCompromised 1.00VulnerableSadmind 0.66DaemonInstalled 0.55

172.016.112.050 SystemCompromised 1.00VulnerableSadmind 0.66DaemonInstalled 0.55

172.016.114.010 SystemCompromised 1.00VulnerableSadmind 0.66

172.016.114.020 SystemCompromised 1.00VulnerableSadmind 0.66

172.016.114.030 SystemCompromised 1.00VulnerableSadmind 0.66

172.016.115.020 SystemCompromised 1.00VulnerableSadmind 0.66DaemonInstalled 0.80

644 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

sets [32]. The dataset includes the network trafficdata collected from both the DMZ and the insidepart of the evaluation network. In the dataset, anintruder probes, breaks-in, installs the DDoS dae-mon, and launches a DDoS attack against an off-site server (Table 1). In this set of experiments, alertsare generated using RealSecure Network Sensor 6.0with the maximum coverage policy which forced theNetwork Sensor to save all the reported alerts. Wechoose data produced by RealSecure Network Sen-sors because attack signatures used in RealSecureNetwork Sensor 6.0 are well documented, and wecould draw upon Ning et al. [19] for the precondi-tion–postcondition rule set.

The alerts are first separated into two parts: thosealerts generated at the DMZ and those generatedinside the network. Each part is then separated intothe training set and the testing set. The training setwas used to estimate the HCPN model parameters,and the testing set was used to evaluate the effective-ness of the model. Each set has about 900 alerts.The training takes less than 5 s and the inferencetakes about 1 s (without speed up approximation)on a 2.8 GHz PC.

As mentioned in Section 2, our HCPN systemoutputs resources compromised. For example, itreports whether a daemon has been installed insteadof the intruder’s action that leads to the installationof the daemon, although the action itself is availableto the administrators and/or reactors if needed.Tables 2 and 3 list the correlation results for theinside network traffic and DMZ network traffic,respectively.

Table 4 shows the actions detected by the HCPN-based alert analyzer along the time. After the actionis inferred to have happened, the alert analyzer auto-matically estimates the new perceived state of the sys-tem, and infers the next goal of the intruder. In this

Table 1Steps of the DDoS attack

Step # Attack

1 IPsweep for live host IPs from a remote site2 Probe with SadmindPing the identified live

IPs for sadmind daemon running on Solaris hosts3 Compromise target hosts via the sadmind vulnerability

(SadmindBOF)4 Install the Trojan mstream DDoS software on

compromised hosts5 Register DDoS Trojan to the master computer6 Launch the DDoS from compromised hosts

against target

experiment, our HCPN-based alert analyzer pre-dicted intruder’s next goals with 100% accuracy. Thisindicates that our HCPN-based alert analyzer cannot only understand an intruder’s progress but alsopredict an intruder’s next possible action and goal.If the whole attack scenario has been completed,HCPN potentially uncovers the intruder’s intrusionstrategy. Note that the accuracy of the action andgoal prediction is directly related to the perplexityof the HCPN model and the knowledge about theactions and their preconditions and postconditions.When many actions might be enabled at the same

Page 14: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

Table 4Actions detected along the time from the inside network traffic

Step Host Action State Next goal

1 172.016.115.020 Sadmind_Ping VulnerableSadmind SystemCompromised2 172.016.112.010 Sadmind_Ping VulnerableSadmind SystemCompromised3 172.016.112.050 Sadmind_Ping VulnerableSadmind SystemCompromised4 172.016.115.020 Sadmind_BOF SystemCompromised DaemonInstalled5 172.016.112.010 Sadmind_BOF SystemCompromised DaemonInstalled6 172.016.112.050 Sadmind_BOF SystemCompromised DaemonInstalled7 172.016.115.020 Rsh DaemonInstalled ReadyForDDoS8 172.016.112.050 Rsh DaemonInstalled ReadyForDDoS9 172.016.112.010 Rsh DaemonInstalled ReadyForDDoS

10 172.016.112.050 Mstream_Zombie ReadyForDDOS LaunchDDoS11 172.016.112.010 Mstream_Zombie ReadyForDDOS LaunchDDoS

The step number indicates the order in which the malicious action is detected by our HCPN alert analyzer. The state column indicates thestate change after the action is detected. The next goal is the predicted intruder’s next goal after the detected action.

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 645

time for a given agent (as in a very big IDS system),the action prediction accuracy will likely decline.

Table 5 illustrates the detection rate, false posi-tive rate, and QoA for RealSecure Network Sensor6.0 and our HCPN-based system. In the table, Rawand Raw2 are results from RealSecure sensors. Thedifference between Raw and Raw2 is the way thenumber of attacks is calculated. Each attempt ofthe same attack from the same agent is consideredas a different attack in Raw (which is used by Ninget al. [19]), while all attempts of the same attack isconsidered as one attack in Raw2. In Raw2, anattack is considered detected as long as one of thealerts corresponding to the attack has beenreported. Note that Raw2 provides a higher detec-tion rate but lower QoA comparing to Raw due tothe decrease of the number of true attacks. DMZ-uniform and Inside-uniform are results of theHCPN model without training (i.e., the transitionprobabilities are set to 1, and the observation prob-abilities are set according to Eq. (1)). This is corre-sponding to the simple precondition–postcondition

Table 5Detection and false alert rates for RealSecure Network Sensor 6.0 and

Setting NOA NA NTAD DR (%)

DMZ-Raw 89 891 51 57.30DMZ-Raw2 12 891 12 100DMZ-uniform 12 42 11 91.7DMZ-HCPN 12 15 12 100.00Inside-Raw 60 922 37 61.67Inside-Raw2 13 922 12 92.31Inside-uniform 13 30 11 84.6Inside-HCPN 13 12 12 92.31

NOA = Number of observable attacks, NA = number of alerts, NNRA = number of real alerts, FPR = false positive rate, RT = repeate

based correlation approaches. DMZ-HCPN andinside-HCPN are results of HCPN after the param-eter estimation.

From the table, we can see that HCPN-basedalert analyzer can greatly reduce the number ofalerts presented to the administrators and activereactors. Without using our alert analyzer, about900 alerts are reported at both the DMZ and theinside network sites. With our HCPN-based alertanalyzer, the total number of alerts reported isreduced to less than 20 at each site. Note that, oursystem can not only reduce the total number ofalerts but also improve the quality of the alerts.With our HCPN-based approach, the false positiverate is decreased from 94% to 20% at the DMZ siteand from 95% to 0% at the inside network site.From the table, we can also see that comparing withthe default parameters, both the detection rate andthe false positive rate are improved after the train-ing. In other words, HCPN can improve theQoA even without training data. When trainingdata are available, however, HCPN can utilize the

the HCPN-based system

NRA FPR (%) RT RAR (%) QoA (%)

57 93.60 6 11.8 �887

57 93.60 45 375 �7225

11 73.8 0 0 �167

12 20 0 0 75.0

44 95.23 7 18.9 �1413

44 95.23 32 267 �7483

11 63.3 0 0 �61.5

12 0 0 0 92.3

TAD = number of true attacks detected, DR = detect rate,d true alerts, RTT = repeated true alert rate.

Page 15: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

646 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

information to further improve the correlationresults. Note that while HCPN with the defaultparameters (as when no training data is available)can reduce the false positives, it may decrease thedetection rate at the same time as shown in Table 5.

When we use QoA as the performance measure-ment, the improvement is also very significant. Inthe DMZ zone, the QoA is �887% for the Raw set-ting, �7225% for the Raw2 setting, �167% for theuniform setting, and 75% for the HCPN setting.In the inside network, the QoA is �1413% for theRaw setting, �7483% for the Raw2 setting, �62%for the uniform setting, and 92% for the HCPN set-ting. Under both conditions, HCPN performs muchbetter than the system without alert correlation andthe system without separating alerts and actions.Please note, however, the DARPA evaluation data-set has some limitations as indicated by McHugh[42]. The improvements in QoA for other datasetsmay not be as significant as what we have got forthe DARPA dataset.

Our HCPN model after training is not perfect.As indicated in Table 5, there are still false negativesand false positives after using HCPN.

An example of false negative can be observed inTable 2, where we can see that the intruder hasinstalled daemons on hosts 172.016.112.010,172.016.112.050, and 172.016.115.020, and is readyto launch the DDoS attack. Note, however, our sys-tem does not detect that the intruder is ready tolaunch DDoS attack on host 172.016.115.020.

An example of false positive is shown in Table 3.According to the description of the dataset, the intru-der tried Sadmind_BOF towards the targets 172.016.114.010, 172.016.114.020, and 172.016.114.030. Noadditional attacks were carried out against thesehosts. This suggests that the Sadmind_BOF attackshad failed. However, since our rules indicate thatthe postcondition of the Sadmind_BOF attack is Sys-temCompromised, our HCPN would report thatthese hosts are compromised as indicated in Table 3.

In these experiments, we used different defaultinitial probabilities from 0.01 to 0.05 and get thesame results.

6.3. Test on attack scenarios from the GCP

Our second set of experiments was conducted onattack scenarios from the GCP. Note that the accessto the original GCP dataset is very restrictive. How-ever, we have managed to generate the scenariosusing our own simulation tool. To generate the

alerts, we used a program to simulate the behaviorof an intruder and another program to simulatethe noise. The simulated intruder picks the next stepbased on the precondition–postcondition relation-ship and predefined probabilities. The noise genera-tor randomly generates noise traffic at random timeintervals. The traffic generated from both programsis merged together based on the time stamp. Usingthe simulation tool, we have generated both thetraining set and the testing set. The attack steps ofthe scenario used in this experiment are summarizedin Table 6.

Table 7 shows the quality of alerts with and with-out HCPN. In the DMZ zone, the QoA is �954%for the Raw setting, �6829% for the Raw2 setting,�71% for the uniform setting, and 93% for theHCPN setting. In the inside network, the QoA is�1224% for the Raw setting, �9686% for theRaw2 setting, �29% for the uniform setting, and93% for the HCPN setting. It is obvious that ourHCPN model can greatly improve the QoA byreducing the false positive rate and increase thedetection rate for this task.

A phenomena not observed in the DARPADDoS experiment but happened in this experimentis the potential of HCPN to recover from the missingalerts. The attacks of IllegalFileAccess are missed inthe raw alerts at DMZ. However, with the implicitoutput arcs in the HCPN model, HCPN correctlydeduced that the files have been illegally accessed.

The inference time for the DMZ traffic is about2 s and the inside network traffic is about 3 s on a2.8 GHz PC if the speed up approximation is notused. Note that the inference time is OðN 2

d � NoÞ ifspeed up approximation is not used as indicated inSection 4.3. In other words, the inference time isproportional to the total number of alerts reportedby the sensors, and the square of the total numberof possible actions encoded into the HCPN. It isnot related to the number of atomic attacks takenby an agent. We have generated variable length ofalerts with variable length of (artificial) attack stepsas test data, and our experiments shown in Table 8confirmed our estimation.

6.4. Alert confidence fusion

Ideally, the confidence score should be equal tothe probability, i.e., reflect the likelihood that thealert is true. For this reason, the effectiveness ofthe confidence fusion can be measured with thecross entropy between the confidence score and

Page 16: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

Table 6Attack steps in the attack scenario from GCP

Step Host Action Description

1 129.0.1.8 Port_Scan Scan the ports to detect FTP service2 129.0.1.7 Port_Scan Scan on a different host3 129.0.1.13 Port_Scan Scan on a different host4 129.0.1.11 Port_Scan Scan on a different host5 129.0.1.12 Port_Scan Scan on a different host6 129.0.1.9 Port_Scan Scan on a different host7 129.0.1.13 FTP_Globbing_Attack Get root access to the host using the

FTP Globbing buffer overflow attack8 129.0.1.13 AgentInstall Download malicious code to the host9 129.0.1.13 IllegalFileAccess Illegally collects information from the host

10 129.0.1.13 Loki Upload sensitive information to the external server11 129.0.1.11 FTP_Globbing_Attack Attack a different host12 129.0.1.13 AgentInstall Attack a different host13 129.0.1.11 IllegalFileAccess Attack a different host14 129.0.1.11 Loki Attack a different host

Table 7Quality of alerts with and without HCPN

Setting NOA NA NTAD DR (%) NRA FPR (%) RT RTR (%) QoA (%)

DMZ-Raw 87 980 58 66.7 63 93.6 5 8.6 �954

DMZ-Raw2 14 980 12 85.7 63 93.6 51 425 �6829

DMZ-uniform 14 34 12 85.7 12 64.7 0 0 �71.4

DMZ-HCPN 14 15 14 100 14 6.7 0 0 93.3

Inside-Raw 98 1384 77 78.6 86 93.8 9 11.7 �1224

Inside-Raw2 14 32 14 100 86 56.3 72 514 �9686

Inside-uniform 14 32 14 100 14 56.3 0 0 �28.6

Inside-HCPN 14 15 14 100 14 6.7 0 0 93.3

NOA = number of observable attacks, NA = number of alerts, NTAD = number of true attacks detected, DR = detect rate,NRA = number of real alerts, FPR = false positive rate, RT = repeated true alerts, RTT = repeated true alert rate.

Table 8Inference time in HCPN

# of Alerts 1054 1384 1868 2568 2568 2568 2568 2568# of Attack

steps14 14 14 14 20 26 30 35

Time (s) 1.8 3.1 4.0 7.4 8.3 7.1 7.3 8.1

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 647

the true distribution of the attacks. However, sinceour ultimate goal is to reduce the false positiveand false negative rates, we evaluate the effective-

Table 9Detection and false positive rates of our HCPN-based system with and

Setting NOA NA NTAD

DMZ 16 15 12Inside 16 12 12Confidence fusion 16 15 15

NOA = number of observable attacks, NA = number of alerts, NNRA = number of real alerts, FPR = false positive rate.

ness of the confidence fusion with QoA in thispaper.

Table 9 compares the result with and without thealert confidence fusion for the DARPA DDoS eval-uation dataset. Note that the detection and falsealert rate reported in Table 9 is different from thatin Table 5 because in Table 5 we focus on the per-formance of individual analyzers, so the detectablealerts in Table 5 are specific to the informationavailable to that analyzer. In Table 9, however, wefocus on the performance of the whole fused system,

without alert confidence fusion

DR (%) NRA FPR (%) QoA (%)

75.0 12 20 56.375.0 12 0 75.093.8 15 0 93.8

TAD = number of true attacks detected, DR = detect rate,

Page 17: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

00.10.20.30.40.50.60.70.80.9

1

1 4Data set

Acc

urac

y

Observer 1 Observer 2

Basic D-S Ext. D-S

2 3 5 6

Fig. 6. Comparison of classification accuracy based on confidence scores from observer 1, observer 2, the combined confidence with thebasic D–S theory, and the combined confidence with our extended D–S theory.

648 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

so the number of detectable alerts is subject to infor-mation available to the whole system. Table 9 showsthat with the alert confidence fusion technologies,the overall detection rate can be increased from75% at each site to 94%, while keeping the falsepositive rate low on the DARPA DDoS evaluationdataset. The QoA is increased from 56% using theDMZ zone alerts alone and 75% using the insidenetwork alerts alone, to 94% using the fused confi-dence scores.

We have run the confidence fusion experimentson other simulation data. In these experiments, con-fidence scores from alert correlation components aresimulated and separated into the training and thetest sets. Fig. 6 compares the classification accuracy(whether an alert should be reported) on confidencescores from observer 1, observer 2, the combinedconfidence with the basic D–S theory, and the com-bined confidence with our extended D–S theory. Itis not difficult to see that in most cases, confidencefusion with our extended D–S theory outperformsthe basic D–S theory as well as observations fromindividual observations.

Fig. 7. A typical alert correlation process. Alerts are first grouped into hthen processed through plan recognition component which utilizes the pnumber of alerts decreases after each step.

7. Related work

7.1. Alert correlation as a filter and an inference

problem

Alert correlation in IDS has been an activeresearch area. Most existing alert correlationapproaches consider the alert correlation problemas the problem of extracting the true alerts (i.e., fil-tering out the false positive alerts) directly from thealerts generated by the IDS sensors based on therelationships (e.g., similarities, sequential relation-ships, etc.) among alerts. These approaches areoften implemented in two steps: alert aggregation(also known as fusion or clustering) and intentionrecognition (also known as plan recognition).Fig. 7 shows the full correlation process based onalert aggregation and plan recognition.

Conceptually, alert aggregation is simple: byaggregating alerts into meta-alerts (or hyper-alerts)based on feature similarities, researchers have foundthat they can reduce the total number of alerts pre-senting to the administrators and/or AR [1,4–

yper-alerts through alert aggregation component. Hyper-alerts arerecondition–postcondition information of the attack actions. The

Page 18: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 649

7,9,11,12]. The aggregation process usually involvesmerging features of the two alerts. For example,alerts from the same sensor and belonging to thesame attack (identified by the same source and tar-get IP address) are considered similar alerts [11]. Inalert aggregation, alerts are first classified into alertclusters that correspond to the same occurrence ofan attack based on similarity. Each cluster is thenmerged and a new, global alert is generated to rep-resent the whole cluster [6,9]. The main purpose ofthe alert aggregation is to reduce the number ofalerts to be provided to the administrators. In otherwords, the alert aggregation approach can improvethe quality of the alerts (defined in Section 5) byreducing the number of repeated alerts. Alert aggre-gation itself, does not directly address false nega-tives or false positives. False negatives occur whenan attack is ‘‘missed’’, so aggregating alerts alonewill not be of assistance. False positives may bereduced somewhat – for instance, if a set of alertstied to acceptable behaviors is flagged as misuse,aggregation will produce only one meta-alert (whichwill be a false positive) rather than the full collectionof raw false positive alerts. However, this does nottruly eliminate the fact that something is raising afalse positive, only reducing the number of times itis reported.

To filter out false positives, researchers have pro-posed using plan recognition [8,13–15,17,18,20,24]after the alert aggregation step. Plan recognition uti-lizes the precondition–postcondition relationshipsbetween activities. The concept is that in a givenattack scenario, system misusers perform a sequenceof steps to violate system security policies, with ear-lier steps preparing for the later ones. An alert islikely to be true if the precondition steps and/orpostcondition steps are also reported in the alerts.By keeping only those alerts, we can potentiallyremove part of the false positives, and henceimprove the quality of the alerts. Plan recognitionseeks to recognize an intruder’s intention from thealerts. The emphasis here is to give administratorsand active reactors better understanding of ongoingactivities so that they can make appropriateresponses. The importance of intention recognitionis not so much in the ‘‘average’’ generic attack ona system, but for instances where it is importantto more fully identify complex, multistage scenarios.Detecting an intruder’s plan at an early stage wouldmake it easier to prevent the intruder from achiev-ing his/her goal. Intention recognition is also aimedto reduce some false positives during correlation;

further, it should be possible to increase true posi-tives (therefore reducing false negatives) by inferringthe existence of attacks during correlation.

In this paper, we have taken a different view andconsidered the problem of alert correlation as theproblem of INFERRING an agent’s actions (whichis not directly observable) from the alerts, instead offiltering out false alerts from all alerts reported bysensors. Note that this is a completely different viewcomparing to the filter view of the problem. Here,we consider and treat alert and action as two differ-ent entities. The agent’s actions are unknown. How-ever, their actions may be partially observed by IDSand reported as alerts that contain incomplete andfalse information. The task of alert analysis is notto find those ‘‘good’’ alerts but to infer the mostpossible actions taken by the agent given the obser-vations. During the inference process, informationsuch as the false positive rate, false negative rate,and the number of same alert issued can all be usedand integrated into the process as discussed in theprevious sections. When no training data are avail-able, HCPN behaves the same as the filter-basedapproaches that rely on only the precondition–post-condition relationship. When training data areavailable, the model parameters can be tuned andbetter correlation results can be achieved.

7.2. Probabilistic framework for IDS

Zhai et al. [40] proposed a Bayesian networkbased probabilistic framework to integrate and rea-son about complementary intrusion evidence suchas alerts generated by IDS and reports by systemmonitoring or vulnerability scanning tools. In theirframework, the basic conditional probability tableis the same as the transition probability used inHCPN. The difference between their approach andHCPN is that HCPN also uses the observationprobabilities in the inference process, while theirapproach integrates additional information fromsystem monitoring or vulnerability scanning toolsto improve the alert quality.

7.3. Colored Petri-Net

A CPN [22,33,34] model of a system provides anexplicit description of both the states and actions ofthe system. CPN has strong modeling ability tomodel systems with concurrent and distributedevents and has been widely used in modeling theDiscrete Event Dynamic System.

Page 19: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

650 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

Petri-Net has also been introduced to model theintruder’s misuse behaviors [25,35,36]. Kumar andSpafford [35,36] proposed to use CPN as the modelto match misuse patterns. In their model, ‘‘Guardsdefine the context in which signatures are matched.The notion of start and final states, and pathsbetween them define the set of event sequencesmatched by the net. Partial order matching can alsobe specified in this model. The main benefits of themodel are its generality, portability and flexibility.’’Ho et al. [25] suggested combining partial orderplanning and executable Petri-Net to detect misuseintrusions. The main benefit of their approach isto allow unordered event to be encoded in the intru-sion signature. They argue that traditional statetransition analysis which is based on strictly orderedand non-overlapping sequence of events is not flex-ible to model the real misuse scenario. For this rea-son, they ‘‘use a partial ordering of events. A partialorder of events specifies that some events areordered with respect to each other while others areunordered. A partial order state transition analysisallows more than one sequence of events in the statetransition diagram.’’ Thus, using Petri-Net to modelthe partial order relationship between steps, theycan represent the set of all intrusion attempts withone diagram.

The advantage of HCPN as compared to CPN isthe ability to provide probabilities, and to modeltransitions and the observations separately. Morespecifically, HCPN differs from CPN in the follow-ing two ways.

First, in CPN, each place either contains exactlyN tokens or not. For example, if the statement that‘‘place A contains one token’’ is true, the statementthat ‘‘place A contains two tokens’’ is false, and viceversa. In HCPN, however, each token element isassociated with a probability (or confidence). Forexample, a place may have a probability of 0.4 tohold one token, a probability of 0.2 to hold twotokens, and a probability of 0.4 to contain no token.Thus, HCPN is a statistical model that representsthe probability of the system being in different statesas opposed to a representational model that mapsprecisely to a particular state of affairs.

Second, no concept of observation exists in CPN.In other words, observations and transitions areconsidered the same in CPN. In HCPN, however,observations are explicitly separated from transi-tions. Each transition can be observed only as aspecific observation with some probability. Thetransitions (actions) are not directly observable

and can be inferred only through the observations(alerts). This property makes HCPN especially suit-able for alert correlation task since the true actionmight be observed as different alerts by IDS as wehave indicated in Section 2.2.

8. Discussions and conclusion

In this paper, we compared two different ways oflooking at the alert correlation problem. We arguedthat approaches based on the filter view of the prob-lem carry with intrinsic limitations. The main limita-tion comes from the ignorance of the differencebetween alerts and the actions during the correla-tion process. A better view of the problem is to con-sider it as an inference problem, where we seek toinfer an intruder’s actions based on the alerts.

We described our novel framework namedHCPN to predict attacks and understand alertsbased on our new take of the problem. We showedthat HCPN has promise as a way to reducing thefalse positive and false negatives in IDS.

8.1. Effectiveness of HCPN

The HCPN-based approach has several advanta-ges to make it a promising approach to improvingthe quality of alerts in IDS. First, it uses the knowl-edge on actions’ preconditions and postconditionsin the correlation process. This knowledge providesadditional information for the system to more reli-ably determine whether an alert issued by the IDSsensors is true or not. The false alerts are removedwhen they are associated to the special transitionnamed ‘‘normal action’’ in the inference process.Second, it uses information such as the false positiverate, and the false negative rate naturally in the cor-relation process (since this information is encoded inthe observation probabilities). Note that false posi-tive rate and false negative rate of a specific alerttype indicate the reliability of alerts in that type.The usage of this information can thus help to fur-ther reduce the false alerts. Third, it presentsresources compromised (instead of actions) to showthe progress of an attack. Since the number ofresources compromised is much smaller than thenumber of raw alerts, HCPN can further reducethe total number of alerts shown to the administra-tors and active reactors. Fourth, the HCPN-basedapproach has potential to reduce the false negatives.This is due to the fact that there is an arc from aresource back to an action (whose postconditions

Page 20: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 651

include compromising that resource) in the HCPNmodel. If later alerts (whose preconditions includethe compromise of that resource) are confirmed tobe true with high probability, the probability thata missing action has happened will be increased.Fifth, it provides confidence scores to the detectionresult by assigning probabilities to each mark indi-cating how likely an intruder has compromised aresource. This confidence score is very useful forthe confidence fusion components to furtherimprove the quality of the alerts. Sixth, HCPNworks with and without training data. If trainingdata are not available, HCPN behaves the same asthe filter-based approaches that rely on only the pre-condition–postcondition relationship. If trainingdata are available, the model parameters can betuned and better correlation results can be achieved.

The HCPN model is a generic model not tied to aspecific attack. It uses an attack action’s precondi-tion and postcondition information to remove falsealerts and recover missing attacks. In other words,HCPN targets on the scenario attacks in which pre-vious steps prepare for the later ones (a non-sce-nario attack can be considered as a one-stepscenario attack). Note that, HCPN does not needto see all possible attack scenarios from the trainingdata. Instead, possible attack scenarios are auto-matically derived from the preconditions and post-conditions. The HCPN is especially effective toremove the false alerts whose preconditions arenot satisfied. As long as the knowledge about theattacks is available, the gain we have seen in theexperiments is very likely to be extended to otherscenario attacks and other datasets.

HCPN can be deployed to different locations andat different levels. For example, there can be oneHCPN for the DMZ zone, one for the inside net-work, and one for each host. Alternatively, therecan be only one HCPN that analyzes alerts fromall different sources. It is easy to see that recoveringmissing alerts is much harder than removing falsealarms. This is endorsed by our experiments. Forthis reason, it is a good idea to optimize the detec-tion rate in the sensors and rely on HCPN toremove false alerts. Under this setting, we expect90% false positive rate reduction with HCPN.

8.2. Assumptions and limitations in HCPN

In HCPN, we have made several assumptions.The applicability of the model is limited under theconditions that the assumptions do not hold.

First, we assume that malicious actions’ precon-ditions and postconditions are known as domainknowledge. This assumption is not new to HCPN.Almost all filter-based alert correlation approachesrely on this assumption and include this knowledgeas rules in a database (e.g. [13]). This knowledge isusually available when the attacks are analyzedand alerts are designed into the IDS sensors. Alter-natively, we can use statistical approaches to get thisknowledge [37]. HCPN heavily depends on thisknowledge to build the links between resourcesand actions. If this knowledge is not available oris incorrect, the effectiveness of the model will begreatly affected.

Second, we assume that the initial probability ofresources owned by the users and/or intruders canbe estimated by the system. For example, the privi-lege of a user can be determined by the logon cre-dential (e.g., anonymous user) he/she originallyuses when entering the system. However, informa-tion available to the system is incomplete and soprobabilities determined this way may be incorrect.For example, an intruder may already gain access tosome resources with other approaches (e.g., socialengineering) before entering the system and theIDS would not know this. To deal with this situa-tion, we can assign a small floor probability to allagent/resource pairs to indicate that each resourcemay be accessible by an intruder through unknownapproaches. From our experiments, HCPN is notsensitive to the floor probability chosen as long asit is within a reasonable range such as 0.01–0.05.Note that other precondition–postcondition basedalert correlation approaches also require informa-tion about the initial state.

Third, the inference process deduces agent’sactions using observation probabilities and transi-tion probabilities. Although these probabilities arehard to be affected by agent’s behaviors as discussedin Section 4, it is still possible that an extremelyskillful intruder may mislead the system and causeincorrect results.

Fourth, we assume that malicious users do notcooperate with each other. With this assumption,we can treat each agent (indicated by a differentsource IP and/or user ID) separately. This assump-tion is valid for many intrusion cases since mostattacks are isolated and script-based (e.g., mostworms). However, this assumption is not valid forsophisticated intrusions, where a skilled intrudercontrols several agents and attacks the same systemat the same time, as for instance the case of a bot

Page 21: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

652 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

network [38]. To handle attacks launched by coop-erating agents, we need to enhance the model toautomatically correlate cooperating agents as‘‘one’’ agent. In other words, the augmented modelneeds to test whether two agents are cooperatingwith each other, and combine the agents if theanswer is yes. A possible approach to testingwhether two agents are actually one is to comparethe likelihood quotient q with a threshold.

q ¼ pðone agent; raw alertsÞpðtwo agents; raw alertsÞ : ð39Þ

Appendix

Actions, alerts, and resources used in the experiments

Actions Alerts

Admind Alert_AdmindDNS_HInfo Alert_DNS_HInfoEmail_Almail_Overflow Alert_Email_Almail_Email_Debug Alert_Email_DebugEmail_Ehlo Alert_Email_EhloEmail_Turn Alert_Email_TurnFTP_Pass Alert_FTP_PassFTP_Put Alert_FTP_PutFTP_Syst Alert_FTP_SystFTP_User Alert_FTP_UserHTTP_ActiveX Alert_HTTP_ActiveXHTTP_Cisco_Catalyst_Exec Alert_HTTP_Cisco_HTTP_Java Alert_HTTP_JavaHTTP_Shells Alert_HTTP_ShellsMstream_Zombie Alert_Mstream_ZomPort_Scan Alert_Port_ScanRIPAdd Alert_RIPAddRIPExpire Alert_RIPExpireRsh Alert_RshSadmind_Amslverify_Overflow Alert_Sadmind_AmsSadmind_Ping Alert_Sadmind_PingSSH_Detected Alert_SSH_DetectedStream_DoS Alert_Stream_DoSTCP_Urgent_Data Alert_TCP_Urgent_DTelnetEnvAll Alert_TelnetEnvAllTelnetTerminaltype Alert_TelnetTerminaTelnetXdisplay Alert_TelnetXdisplayUDP_Port_Scan Alert_UDP_Port_ScEventCollector_Info Alert_EventCollectorIllegalFileAccess Alert_IllegalFileAcceLoki Alert_LokiAgentInstall Alert_AgentInstallFTP_Globbing_Attack Alert_FTP_GlobbingNormal_Action

We consider this as one of the future workitems.

Acknowledgements

Authors would like to thank members in Centerfor Secure and Dependable Software in Universityof Idaho for the valuable discussion. Thanks arealso given to anonymous reviewers for their greatcomments and suggestions in improving thispaper.

Resources

SystemExistsCiscoCatalyst3500XL

Overflow ActiveXEnabledBrowserSystemCompromisedGainInformationJavaEnabledBrowserMailLeakageSMTPSupportEhloSMTPSupportTurnVulnerableCGIBinVulnerableAlMailPOP3Server

Catalyst_Exec VulnerableSadmindDNS_HInfoDDOSHappened

bie ExistServiceSendMailInDebugModeExistFTPServiceReadyToLaunchDDOSAttackGainTerminalType

lverify_Overflow DaemonInstalledSensitiveInfoAccessedSensitiveInfoLeakedAgentInstalled

ata

ltype

an_Infoss

_Attack

Page 22: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654 653

References

[1] K. Julisch, M. Dacier, Mining intrusion detection alarms foractionable knowledge, in: Proceedings of the 8th ACMInternational Conference on Knowledge Discovery and DataMining, July 2002, pp. 366–375.

[2] J. Allen, A. Christie, W. Fithen, J. McHugh, J. Pickel, E.Stoner, State of the practice of intrusion detection technol-ogies, Technical Report CMU/SEI-99-TR-028, 1999.

[3] S. Axelsson, The base-rate fallacy and its implications for thedifficulty of intrusion detection, in: 6th ACM Conference onComputer and Communications Security, November 1999,pp. 1–7.

[4] K. Julisch, Mining alarm clusters to improve alarm handlingefficiency, in: Proceedings of the 17th ACSAC, New Orleans,December 2001, pp. 12–21.

[5] R.C. de Boer, A Generic architecture for fusion-basedintrusion detection systems, Master Thesis, Erasmus Uni-versity Rotterdam, October 2002.

[6] F. Cuppens, Managing alerts in a multi-intrusion detectionenvironment, in: 17th Annual Computer Security Applica-tions Conference, New Orleans, USA, December 2001, pp.22–31.

[7] H. Debar, A. Wespi, Aggregation and correlation ofintrusion–detection alerts, in: Proceedings of the 4th Inter-national Symposium on Recent Advances in Intrusiondetection (RAID), 2001, pp. 87–105.

[8] R.P. Goldman, W. Heimerdinger, S. Harp, C.W. Geib, V.Thomas, R. Carter, Information modeling for intrusionreport aggregation, in: Proceedings of the DARPA Infor-mation Survivability Conference and Exposition II (DIS-CEX-II), June 2001, pp. 329–342.

[9] P.A. Porras, M.W. Fong, A. Valdes, A mission-impact-based approach to INFOSEC Alarm correlation, in: Pro-ceedings Recent Advances in Intrusion Detection, October2000, pp. 95–114.

[10] P.A. Porras, P.G. Neumann, EMERALD: event monitoringenabling responses to anomalous live disturbances, in: 1997National Information Systems Security Conference, October1997, pp. 353–365.

[11] A. Valdes, K. Skinner, Probabilistic alert correlation, in:Proceedings of the 4th International Symposium on RecentAdvances in Intrusion Detection (RAID), 2001, pp. 54–68.

[12] N. Ye, J. Giordano, J. Feldman, Q. Zhong, Informationfusion techniques for network intrusion detection, in: 1998IEEE Information Technology Conference, InformationEnvironment for the Future, 1998, pp. 117–120.

[13] F. Cuppens, F. Autrel, A. Miege, S. Benferhat, Correlationin an intrusion detection process, Internet Security Commu-nication Workshop (SECI’02), September 2002, pp. 153–172.

[14] F. Cuppens, A. Miege, Alert correlation in a cooperativeintrusion detection framework, in: 2002 IEEE Symposiumon Security and Privacy, May 2002, pp. 202–215.

[15] C. Geib, R. Goldman, Plan recognition in intrusion detec-tion systems, in: DARPA Information Survivability Confer-ence and Exposition (DISCEX), June 2001, pp. 46–55.

[16] M.-Y. Huang, T.M. Wicks, A Large-scale distributedintrusion detection framework based on attack strategyanalysis, in: Web Proceedings of the First InternationalWorkshop on Recent Advances in Intrusion Detection(RAID’98), 1998, pp. 2465–2475.

[17] P. Ning, Y. Cui, D.S. Reeves, Analyzing intensive intrusionalerts via correlation, in: Proceedings of the 5th InternationalSymposium on Recent Advances in Intrusion Detection(RAID 2002), LNCS, vol. 2516, October 2002, pp. 74–94.

[18] P. Ning, Y. Cui, D.S. Reeves, Constructing attack scenariosthrough correlation of intrusion alerts, in: Proceedings of the9th ACM Conference on Computer & CommunicationsSecurity, November 2002, pp. 245–254.

[19] P. Ning, D.S. Reeves, Y. Cui, Correlating alerts usingpreconditions of intrusions, Technical Report, TR-2001-13,North Carolina State University, Department of ComputerScience, December 2001.

[20] S.J. Templeton, K. Levitt, A requires/provides model forcomputer attacks, in: Proceedings of the 2000 workshop onNew security paradigms, 2001, pp. 31–38.

[21] J. Haines, D.K. Ryder, L. Tinnel, S. Taylor, Validation ofsensor alert correlators, IEEE Security and Privacy 1 (1)(2003) 46–56.

[22] K. Jensen, An introduction to the theoretical aspects ofcoloured petri nets, in: J.W. de Bakker, W.-P. de Roever, G.Rozenberg (Eds.), A Decade of Concurrency, LNCS, vol.803, Springer-Verlag, 1994, pp. 230–272.

[23] K. Jensen, Coloured Petri Nets and the invariantmethodTheoretical Computer Science, vol. 14, North-Hol-land, 1981, pp. 317–336.

[24] K. Ilgun, R. Kemmerer, P. Porras, State transition analysis:a rule-based intrusion detection system, IEEE Transactionson Software Engineering 21 (3) (1995) 181–199.

[25] D. Frincke, D. Tobin, Y. Ho, Planning, Petri Nets, andintrusion detection, in: Proceedings of the 21st NationalInformation Systems Security Conference (NISSC’98), 1998,pp. 346–361.

[26] Somesh Jha, Oleg Sheyner, Jeannette M. Wing, Minimiza-tion and reliability analyses of attack graphs, TechnicalReport, CMU-CS-02-109, February 2002.

[27] T. Moon, The expectation–maximization algorithm, IEEESignal Processing Magazine (1996) 47–60.

[28] I.J. Good, The population frequencies of species and theestimation of population parameters, Biometrika 40 (Parts 3and 4) (1953) 237–264.

[29] D. Frincke, Balancing cooperation and risk in intrusiondetection, ACM Transactions on Information and SystemSecurity (TISSEC) 3 (1) (2000) 1–29.

[30] G. Shafer, A Mathematical Theory of Evidence, PrincetonUniversity Press, Princeton, 1976.

[31] D. Yu, D. Frincke, Alert Confidence fusion in intrusiondetection systems with extended Dempster–Shafer theory,in: Proceedings of ACMSE 2005.

[32] Lincoln Lab, MIT. DARPA 2000 intrusion detection eval-uation datasets, 2000. Available from: <http://ideval.ll.mit.edu/2000index.html>.

[33] K. Jensen, Colored Petri-Nets—Basic Concepts, AnalysisMethods, and Practical Use, second ed., vol. 1, Springer-Verlag, New York, 1996.

[34] L.M. Kristensen, S. Christensen, K. Jensen, The practi-tioner’s guide to coloured Petri nets, International Journalon Software Tools for Technology Transfer 2 (1998) 98–132.

[35] S. Kumar, E.H. Spafford, A pattern-matching model forintrusion detection, in: Proceedings of the National Com-puter Security Conference, 1994, pp. 11–21.

Page 23: Improving the quality of alerts and predicting intruder’s next goal with Hidden Colored Petri-Net

654 D. Yu, D. Frincke / Computer Networks 51 (2007) 632–654

[36] S. Kumar, E. Spafford, A Pattern matching model for misuseintrusion detection, in: 17th National Computer SecurityConference, 1994, pp. 11–21.

[37] X. Qin, W. Lee, Statistical causality analysis of INFOSECalert data, in: Proceedings of the 6th International Sympo-sium on Recent Advances in Intrusion Detection (RAID),2003, pp. 73–93.

[38] D. Dittrich, Dissecting distributed malware networks.Availabel from: <http://security.isu.edu/ppt/pdfppt/Core02.pdf>.

[39] D. Yu, D. Frincke, A Novel framework for alert correlationand understanding, in: International Conference on AppliedCryptography and Network Security (ACNS), Springer’sLNCS series, vol. 3089, 2004, pp. 452–466.

[40] Y. Zhai, P. Ning, P. Iyer, D.S. Reeves, Reasoning aboutcomplementary intrusion evidence, in: Proceedings of 20thAnnual Computer Security Applications Conference,December 2004, pp. 39–48.

[41] J.W. Harris, H. Stocker, Maximum likelihood method§21.10.4, in: Handbook of Mathematics and ComputationalScience, Springer-Verlag, New York, 1998, pp. 824.

[42] J. McHugh, Testing intrusion detection systems: a critique ofthe 1998 and 1999 DARPA off-line intrusion detectionsystem evaluation as performed by Lincoln laboratory,ACM Transactions on Information and System Security 3(4) (2000).

Dong Yu holds a Ph.D. degree in com-puter science from University of Idaho,an MS degree in computer science fromIndiana University/Bloomington, an MSdegree in electrical engineering fromChinese Academy of Sciences, and a BSdegree (with honor) in electrical engi-neering from Zhejiang University(China). He has been working inMicrosoft since 1998. His researchinterests are in speech processing, com-

puter and network security, and machine learning. He has pub-lished dozens of refereed journal and conference papers in the

above areas. He is a senior member of IEEE and a member ofACM.

Deborah Frincke is Chief Scientist of theCyberSecurity groups at Pacific North-west National Laboratory. Prior tojoining PNNL, she was a member of theUniversity of Idaho faculty, departing asa Full Professor. She co-founded one ofthe first NSA Centers of AcademicExcellence in Information Assurance,through University of Idaho’s Center forSecure and Dependable Systems. Hermid-90s research served as the basis for

TriGeo Network Systems’ commercial product. Her researchspans a broad cross section of computer security, including very

large system defense, security risk/benefit analysis, alert consoli-dation, and computer security education. She has published overeighty articles and technical reports. She received her Ph.D. incomputer science with a focus on computer security from Uni-versity of California, Davis in ’92.