statistical application fingerprinting for ddos attack mitigation · 2019-12-26 · ahmed et al.:...

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO. 6, JUNE 2019 1471

Statistical Application Fingerprinting forDDoS Attack Mitigation

Muhammad Ejaz Ahmed, Saeed Ullah, and Hyoungshick Kim

Abstract— Because of the dynamic nature of network trafficpatterns, such as new traffic application arrivals or flash events,it is becoming increasingly difficult for conventional anomalydetection systems to separate various applications based ontheir traffic patterns. In this paper, by leveraging transportlayer packet-level and flow-level features, new structures calledapplication fingerprints are generated, which express such featuresin a compact and efficient manner. Based on the generatedfingerprints, we propose a novel traffic classification framework.The proposed system generates profiles of normal applicationsusing a multi-modal probability distribution. The proposedclassification framework is then extended to detect distributeddenial of service attacks from the collected statistical informationat flow level. To demonstrate the feasibility of the proposedsystem, we evaluate its performance using five real-world trafficdatasets. The experiment results show that the proposed methodis capable of achieving an accuracy of over 97%, whereas themisclassification rate is only 2.5%.

Index Terms— Traffic classification, anomaly detection, fin-gerprinting traffic applications, nonparametric clustering, DDoSattack detection.

I. INTRODUCTION

THE unmatched potential of cyber-attacks, including espi-onage, military and strategic data stealing, distributed

denial of service (DDoS), or even control attacks on commandand control systems, is a major concern in the modern era.Making this worse is the dramatic increase in the Internetof Things (IoT) systems with their associated vulnerabilities,which could be exploited to launch massive DDoS attacksagainst any organization. Gartner reports that the number ofinternet-enabled IoT devices will increase to 26 billion by2020 [1]. For example, in 2016, a massive DDoS attack inthe range of 1.2 Tbps [2] launched by the Mirai botnet wascarried out using compromised Internet-enabled IoT devicesto target the Dyn server, which knocked major websites

Manuscript received May 1, 2018; revised September 18, 2018 andOctober 26, 2018; accepted October 27, 2018. Date of publicationNovember 5, 2018; date of current version February 13, 2019. This workwas supported in part by the ITRC under Grant IITP-2018-2015-0-00403 andin part by the MSIT under Grant 2016-0-00078. The associate editor coor-dinating the review of this manuscript and approving it for publication wasProf. David Starobinski. (Corresponding author: Hyoungshick Kim.)

M. E. Ahmed is with the Department of Software, Sungkyunkwan Univer-sity, Suwon 16419, South Korea, and also with the Data61, CommonwealthScientific and Industrial Research Organization, Sydney, NSW 2122, Australia(e-mail: [email protected]).

S. Ullah is with the Department of Computer Science and Engi-neering, Kyung Hee University, Yongin 17104, South Korea (e-mail:[email protected]).

H. Kim is with the Department of Software, Sungkyunkwan University,Suwon 16419, South Korea (e-mail: [email protected]).

Digital Object Identifier 10.1109/TIFS.2018.2879616

offline including Twitter, Spotify, Amazon, Reddit, Netflix,and The New York Times [3]. In particular, the proliferationof vulnerable IoT devices presents yet another significantchallenge that organizations must consider in order to defendagainst DDoS attacks. As a result, organizations must have toprotect themselves against such attacks that congest networktraffic.

Classifying network traffic applications that generate themcan yield suitable measures for countering network attacks,deploying QoS-aware enforcements successfully, and plan-ning for and preventing network congestion. Classifying suchapplications based on their behavior is a viable approachto combat network attacks, and at the same time predictinglegitimate/normal traffic application behavior. Moreover, traf-fic classification enables network administrators to be aware ofthe normal behavior of traffic applications and can effectivelydetect anomalies arising due to various network attacks [4].Traditional approaches for traffic classification fall into thefollowing categories: 1) Port based, this approach consistsof the analysis of the used communication ports. The well-known ports for the protocols are assigned by the IANA.Port based approach is useful only for the applications andservices, which use fixed port numbers hence easy to deceiveby changing port numbers. As a result, this approach is notvalid anymore due to the fact that many applications usedynamic ports (e.g., P2P) and try to disguise themselves byusing known port numbers. 2) Payload based techniques thatrely on deep packet inspection matches predefined signaturesof applications. This technique performs well if the payload isnot encrypted, but due to packet’s payload encryption and dataobfuscation (Skype traffic), these approaches are not effective.Therefore, in order to be aware of the behavior of legitimatetraffic applications and at the same time prevent networkattacks such as DDoS, we need to effectively classify trafficpatterns generated from applications (legitimate or attack).

In our preliminary work, we demonstrated that Software-Defined Network (SDN) can be used to mitigate DNSquery-based DDoS attack [20]. We leveraged three flow-levelfeatures, such as total number of packets transmitted, ratio ofsource and destination bytes, and connection duration time,to classify normal and DDoS attack traffic using Bayesiannonparametric clustering. However, in this study, we con-sider not only flow-level features but also packet-level trafficfeatures to derive a multi-modal probability distribution forgenerating profiles for normal applications. We also show thatthose profiles can be used to detect unknown network trafficincluding attack traffic based on the unsupervised clusteringapproach.

1556-6013 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

https://orcid.org/0000-0002-1605-3866

1472 IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 14, NO. 6, JUNE 2019

The main motivation behind our work is that the statisticalproperties of basic elements of a network flow, such aspacket and flow features, can be effectively used to deter-mine which applications are generating legitimate traffic andthose that are generating attack traffic. We propose networktraffic classification approach which aim to classify traf-fic patterns based exclusively on their statistical properties.We define the notion of application fingerprints, which takeinto account packet-level1 features and generate fingerprintsin a compressed but efficient manner. The “known-to-the-system” fingerprinted applications’ statistical characteristicsare used to generate normal (legitimate) profiles. The “known”application fingerprints are obtained using limited trainingdata, deciding whether a flow belongs to a given applicationlayer protocol or was generated by an “unknown” (i.e., non-fingerprinted) application. The application fingerprints enableus to measure “how far” an unknown traffic flow is from thebasic characteristics of each protocol. We utilize flow-levelfeatures2 as well as packet-level features to identify DDoSattacks, and normal network traffic. Here, the “unknown”applications’ fingerprints, obtained using the proposed trafficclustering approach, are further analyzed to detect whetherthose clusters are generated from network attacks such asDDoS. The proposed approach is resilient against adversariesemulating legitimate users by adjusting traffic features inorder to evade anomaly detection systems. This is owing tothe inherent nature of the proposed fingerprinting approach,which takes into account a large number of traffic instancesfrom traffic applications and generates respective applicationfingerprints, where in practice an adversary can craft fewertraffic instances, and not sufficiently many to generate its ownfingerprint to evade the system.

The contributions of this work are summarized as follows:1) We propose a simple application fingerprinting

approach, based on the analysis of simple properties ofnetwork traffic such as packet-level features of trafficflows. The fingerprints are obtained in a highly compactand efficient manner. The algorithm for fingerprinting iscomputationally simple: it involves a histogram-basedweighted-mean approach in a window of the observedtraffic features. The obtained fingerprints are then usedto generate profiles of normal applications based onlimited training data.

2) We propose Dirichlet processing mixture model3

DPMM-based clustering approach to automatically clus-ter traffic based on applications’ flow-level features.DPMM fit well here, because the number of trafficapplications is unknown in advance and may increaseas more data are observed. DPMM is unsupervised innature, considering only the observations from traffic,and it is nonparametric, because the number of trafficapplications is not known in advance. Unlike previouswork in [24] and [25], the proposed DPMM generates

1Packet-level features refer to per packet details in headers, such as packetlength, inter-arrival time, TTL, and IP protocol.

2By flow-level features, we mean flow statistics such as flow duration, totalpackets/bytes, and source/destination bytes.

3A Bayesian nonparametric model for clustering.

a multi-modal probability distribution4 to model thenormal traffic profiles of applications. Because trafficapplications exhibit unique behaviors [28], it is impor-tant to model them using a multi-modal probabilitydistribution rather than a uni-modal one. For example,traffic behaviors under normal and flash events aresignificantly different [26], and so modeling such trafficusing uni-modal statistics is not appropriate.

3) We further propose an approach based on the extendedfeatures (packet-level and flow-level) set to detect DDoSattacks (e.g., Slowloris and packet flooding). By flow-level features, we mean the traffic flow’s statistics, suchas flow duration, total packets/bytes in a flow, and uni-or bi-directional flow characteristics. A DDoS attackshares several packet-level characteristics with normaltraffic. Therefore, it is challenging to distinguish attacktraffic from normal traffic, and therefore we utilizeboth flow-level and packet-level features to distinguishnetwork attacks from normal traffic. The proposed detec-tion algorithm decides the legitimacy of an “unknown”application fingerprint based on the traffic flow behavior.

We evaluate the proposed framework through extensiveexperiments on real network traces from ISCX’s intrusiondetection dataset [34], CAIDA2007 [35], CAIDA2016 [36],and KAIST [37]–[39].

The remainder of this paper is organized as follows.In Section II, we discuss related work concerning attackdetection. In Section III, we provide a description of thesystem model. The definitions of protocol fingerprints andnormal traffic profile generation are explained in Section IV.Section V explains the attack traffic detection approach. Exper-imental results are discussed in Section VI, while Section VIIconcludes the paper.

II. RELATED WORK

In this section, we discuss recent literature on anomaly andDDoS attack detections. We briefly explain their methodolo-gies in order to detect/mitigate network attacks.

A. Anomaly Detection

Recent literature on anomaly detection [4], [5], [7], [8]focuses on network traffic features for attack detection.Crotti et al. [4] exploited the packet length and packet inter-arrival time for traffic classification using simple statisticalfingerprinting. However, although using only packet-level fea-tures is effective in traffic classification, this may fail to detectnetwork attacks, because some attacks such as DDoS exhibitsimilar statistical properties in packet-level features to nor-mal traffic [10], [15]–[18]. Therefore, to further improve theclassification accuracy, and most importantly to detect attacktraffic, relying only on packet-level features may not suffice.Bhuyan et al. [5] proposed an anomaly detection methodwhich is based on a multi-step outliers. To select the rel-evant and non-redundant subsets of features for the detec-tion of outliers/anomalies, they leveraged mutual information

4There is more than one “peak” in the distribution of the data.

AHMED et al.: STATISTICAL APPLICATION FINGERPRINTING FOR DDoS ATTACK MITIGATION 1473

and generalized entropy based on the selection of features.Hammamoto et al. [7] proposed a network anomaly detectionsystem using a genetic algorithm and fuzzy logic scheme.They incorporated six traffic flow features, namely bits per sec-ond, packets per second, source and destination IP entropies,and source and destination port entropies. Their proposedmechanism is based on an unsupervised learning method.Their experimental results showed that their proposed systemachieves a 96.53% accuracy and 0.56% false positive alarms.Fernandes, Jr., et al. [8] proposed a profile-based anomalydetection system. For pattern recognition and anomaly detec-tion, they employed principal component analysis (PCA), antcolony optimization, and dynamic time warping techniques.Their approach generates a traffic profile for normal networkbehavior, which is compared with the real network traffic torecognize anomalous events. However, we note that under asingle normal profile for traffic, the classification performancemay be weakened by the dynamic nature of network condi-tions. For instance, under flash events the profile of normaltraffic changes drastically, thereby traditional anomaly detec-tion systems may fail to perform well under such conditions.

B. DDoS Attack Detection

Typically, attack tools are prebuilt programs, which areusually the same for one botnet [27]. A bot-master initiates anattack command on the compromised hosts (bots) in its botnetto launch an attack session. Evidence for this claim is givenin the literature [15], [17], [18]. Consequently, DDoS attackflows received at the targeted web server are an aggregationof many original attack flows, and the aggregated attack flowsshare a similar standard deviation with the original attackflow.

Detecting DDoS attacks relies on learning normal traf-fic behavior, so as to classify abnormal behavior of trafficbased on the model learned from normal traffic [9]–[13].Matta et al. [9] proposed a model for detecting DDoS attacks.The introduced a botnet that emulates normal traffic by con-tinuously learning normal traffic patterns from network traffic.Moreover, they presented an inference algorithm to provide aconsistent estimate of possible botnets hidden in the network.Tan et al. [11] proposed an approach to learning normaltraffic behavior, and identifying “known” and “unknown” DoSattacks based on this normal traffic behavior. Their systemextracts the geometric correlations between network trafficfeatures using a multivariate correlation analysis for accuratenetwork traffic characterization. Furthermore, they proposeda triangle-area-based Martinique technique to enhance andspeed-up the multivariate correlation analysis. Their approachachieved up to a 99.95% accuracy for the KDD Cup 99 dataset.Ben-Porat et al. [12] presented an approach to detecting DDoSattacks via identifying malicious users. They employed a vul-nerability metric to evaluate the hash table data structure. Theyfurther demonstrated that a closed hash is more vulnerable toDDoS attacks than an open hash. One of their key findings isthat regular users suffer from a performance degradation evenafter an attack has ended. François et al. [13] presented anearly prediction and prevention method for flooding DDoS

attacks. Their proposed mechanism forms a virtual protec-tion ring around hosts to collaborate via exchanging trafficstatistics.

Fries [14] employed fuzzy clustering of TCP packets todetect network intrusions with the reduced set of features.The main motivation behind using fuzzy-based systems isthe fact that classification-based systems are often unable toadapt to the dynamic conditions of a network, and moreoverthey are useful in identifying previously unknown attacks.Anderson and McGrew [6] experimentally identified that cer-tain common machine learning algorithms under-perform onencrypted malware traffic. One of their key findings is thatfeature engineering is a decisive factor, and incrementallyadding features suggested by domain experts to the initiallyselected feature set has a positive impact on the performanceof a classifier.

III. SYSTEM MODEL

We briefly present our system for monitoring and detectingnormal traffic behavior. In our approach, we progressivelyutilize traffic features ranging from packet-level to flow-levelin order to detect abnormalities in network traffic.

In stealthy DDoS attacks such as Slowloris, packet-levelfeature-based approaches may allow attack traffic to bypass ananomaly detection system because they are crafted in a waythat resembles normal traffic. In order to clearly distinguishthe traffic patterns generated by normal and attack traffic,we incorporate flow-level statistics in addition to packet-levelfeatures in our method.

A. Packet-Level Features

The rationale behind packet-level feature inspection is tobuild profiles of normal traffic applications, and this methodis quite effective in classifying traffic applications [4], [20],[21], [34]. All traffic features are listed in Table I. We considerthe following most important packet-level traffic features forapplication fingerprinting:

• Packet length: Packet lengths for different traffic payloadsare likely to be different. For example, the UDP packetsize is larger, the HTTP packet size may vary dependingon the dynamic behavior of game applications, and VoIPpackets have smaller packet lengths in order to minimizejitter. A set of packets with their corresponding lengthsis represented as LP = {pl1, pl2 , . . . , plN }, where pln isthe length of the nth packet.

• Packet inter-arrival time: Packet inter-arrival times fordifferent applications also vary depending upon therequirements of applications. For example, in VoIP theinter-arrival time is small, in order to avoid undesirableeffects caused by jitters. We define the vector of packetinter-arrival times as IP = {pi1, pi2 , . . . , piN }, wherepin is the nth packet inter-arrival time for the trafficapplication.

• Time-to-live: We consider time-to-live (TTL) as a feature,because each traffic application adjusts this value inde-pendently. After analyzing various real network traces,we observe that each application sets the TTL value for


TABLE I

REPRESENTATIVE NETWORK FEATURES

their packets independently. The TTL value for packetsin a flow usually ranges between 40 and 255. We denotethis feature as T tlP = {ttl1, ttl2, . . . , ttlN }, where t tln isthe nth packet’s TTL value.

Consequently, the dataset of packet-level features can berepresented as

Xp = [LP, IP, T tlP],where Xp is matrix containing all features.

Flow-level features are also incorporated in our study due tothe fact that certain attacks’ flow characteristics show differentbehavior as compared to the normal traffic application, e.g.,flow connection duration, proportion of traffic flow fromsource to destination and vice versa, etc. We analyzed thetotal 18 packet- and flow-level traffic features listed in Table I.However, we selected only those features exhibiting distin-guishing behavior among various traffic applications. Theselected flow-level features are discussed in Section V.

B. Proposed Framework

Fig. 1 illustrates the proposed framework for generating pro-files of normal traffic applications and detecting attack trafficbased on application fingerprinting. After collecting the data,the proposed framework extracts packet-level features. Packet-level features are used to generate fingerprints for trafficapplications. Here, we utilize a histogram-based fingerprintingapproach, which is discussed in the following section. Then,the fingerprints (transformed dataset) are clustered using anunsupervised approach, and the output clusters are mappedto their respective application traffic with assistance from thelabeled training data. The clusters mapped to applications arecalled “known” application fingerprints, and those not mappedare termed “unknown” application fingerprints. The known andunknown application fingerprint sets are denoted by K f p andU f p, respectively. The known application clusters are used togenerate normal traffic profiles. The generated normal trafficprofiles form a multi-modal probability distribution, whereeach mode in the density function corresponds to a uniquetraffic application.

Fig. 1. Framework of the proposed classification scheme for normal profileestimation and attack detection.

As shown in Fig. 1, our approach take into account packet-and flow-level features separately in classifying normal andattack traffic. It is due the following reasons: first, our aimis to quickly generate normal profiles by leveraging theavailable training data. For that, we utilize only packet-levelfeatures and a lightweight clustering algorithm (Mean-shift)to generate normal profiles. Second, by merging packet- andflow-level features and applying classification algorithm onthe merged feature may result in clusters which are notspecific to characteristics of applications, i.e., traffic fromdifferent applications may result in a same cluster. Finally,by analyzing unknown clusters flow-level features separatelyfocuses on detecting attack traffic using Bayesian nonpara-metric approach (DPMM). Moreover, DPMM may result innew clusters that correspond to various types of DDoS attackssuch as Slowloris and packet flooding. In short, packet-levelfeatures focuses more on quick generation of normal profilesand flow-level features plays an important role in detectingattacks traffic. As shown in Fig. 1, flow-level features areextracted from the unknown application fingerprinted clusters,and the DPMM-based classification is applied to detect attackand normal traffic.

IV. NORMAL TRAFFIC PROFILES GENERATION

USING PACKET-LEVEL FEATURES

In this section, we present our method for generating normaltraffic profiles using packet-level features. In practice, the nor-mal behavior of network traffic follow a multi-modal proba-bility distribution [28]. Gu et al. [24] used the KL divergencebetween the current and reference traffic to detect anomalies.Scherrer et al. [25] proposed using a unimodal probabilitydistribution, i.e., the gamma distribution, to separate the attacktraffic from legitimate traffic. In this study, we rely on a multi-modal probability distribution to model profiles of normaltraffic, owing to the fact that real traffic varies significantlyaccording to applications’ requirements. Therefore, instead ofrelying on a unimodal probability distribution, we incorporatea multi-modal distribution in which each mode in the estimateddistribution corresponds to a unique traffic application.


Fig. 2. Packet-level features before application fingerprinting. It is obviousthat flooding and Slowloris are located at the same location in the three-dimensional feature space, which makes it difficult for a classifier to classifysuch attack.

A. Application Fingerprinting

In this section, we focus on traffic classification basedon packet-level statistics produced by network applicationsexchanging data through TCP connections, such as HTTP,SMTP, DNS, POP, and SSH. Fingerprinting plays an importantrole and provides summarized information about applications’behaviors in terms of variations in packet-level characteris-tics. For instance, Fig. 2 illustrates normal and attack trafficbehavior in terms of three packet-level features. We clearlysee an overlap of among various types of normal traffic in themiddle-lower part of Fig. 2. Moreover, the two types of DDoSattacks (Slowloris and packet flooding), whose patterns aresignificantly different from each other, overlap in the middlepart of Fig. 2.

The generation of a given application’s fingerprint startsfrom the evaluation of packet-level features. We leveragehistogram5 method on a window of incoming packets at therouter’s interface to compute weighted-mean (Wml ) for aset of packets in a window of size W , where the windowsize W is set according to the proportion of an applica-tion’s traffic. The weighted mean is similar to an ordinaryarithmetic mean6, except that instead of each of the datapoints contributing equally to the final average, some datapoints contribute more than others. The rationale behindusing the weighted-mean approach is that packet lengths andtheir inter-arrival times vary significantly for various applica-tions and are not uniformly distributed across the range ofpacket lengths and packet inter-arrival times. For instance,in HTTP traffic packet lengths are highly concentrated, eitherbetween 50 and 500 bytes or above 1400 (mostly 1500)bytes. Thus, giving equal weights to packet lengths between50 and 1500 bytes is not suitable. Moreover, for VoIP trafficthe packet length is usually 210 bytes, in order to avoid jittersin voice communication [28].

1) Weighted-Mean Computation of Packet Inter-ArrivalTime and Packet Length: First, histogram for a set of incoming

5A histogram is an accurate graphical representation of the distribution ofnumerical data. It is an estimate of the probability distribution of a continuousvariable.

6The most common type of arithmetic average.

packets in a window of size W is computed. The output of ahistogram function is a vector of the number of occurrencesof random variables �o and a corresponding vector of thecenters �μ of the histogram bins, as given in (1). Then,the lth probability density function PDFl of a histogram iscomputed using (3), where L = N/W and N is the numberof datapoints/observations. Consequently, we obtain total Lhistogram-based probability density functions PDFl over Ndatapoints, as given in (3). We use the number of histogrambins and their corresponding weights to compute L PDFs, andcorresponding weighted-means {Wml}L

l=1 using the followingequations:

[�ob f , �μb f ] = Histogram({xi}Wi=1, F), (1)

Wm =F∑

f =1

�μb f Pr(�μb f ), where (2)

Pr(�μb f ) = �ob f∑Ff =1 �ob f

, (3)

where F is the total number of bins. Furthermore, �μb f isa vector containing the centers of F bins, and �ob f is thecorresponding vector containing the number of occurrencesof random variables in each bin. The window sizes used forvarious traffic applications in our experiments are providedin Table V, where the number of bins F is set to 50 for allapplications.

Algorithm 1 getWeightedMeansVector: Histogram-BasedPDF and Weighted-Mean Generation

Input: {xn}Nn=1, W , and F .

where xn is a packet-level feature, W is the window size,and F is number of bins.Result: Weighted-mean vector {Wml }L

l=11: Divide feature data in to L disjoint subsets, where L = N

W2: The lth subset, Sl = {x(n=l)}3: for each l →L do4: Compute �ol,b f and �μl,b f using Eq. (1)5: Compute �wl using Eq. (3)6: Compute Wml using Eq. (2)7: l = l + 18: Sl = Next window of size W from {xn}N

n=1

The steps for computing weighted-means are enumeratedin Algorithm 1. The input of the algorithm consists of afeature’s dataset {xn}N

n=1, i.e., a packet-level feature’s data(packet length L P , packet inter-arrival time IP , or time-to-live T tlP ) and W is the window size. The input feature’sdata are divided into L windows, each containing W packets(steps 1 and 2 of Algorithm 1). For each l ∈ L, the histogrammethod is applied to compute the vectors of occurrences �ol,b f

and bin centers �μl,b f using Eq. 1 (step 4). Based on �ob f ,the probabilistic weights (PDF) of bins are computed usingEq. 3. Following this, the weighted-mean for the lth PDF iscalculated using (2) (steps 5 and 6). This procedure repeatsfor the l + 1th subset of feature’s data. Algorithm 1 is used tocompute the fingerprints for packet lengths and packet inter-arrival times.


2) Weighted-Mean Computation of TTL: For the packets’TTL features, instead of using the weighted-mean approachfor fingerprinting, we employ a window-based entropy methodto capture the amount of randomness in TTL values forincoming packets. The rationale behind using entropy lies inthe fact that the amount of randomness observed in legitimateapplications is diverse [27], whereas in the case of attack trafficthe randomness (entropy) in TTL is usually small, or almostzero in some cases. The reason for this is the fact that DDoSprograms such as Mirai are usually pre-built programs, andalmost the same attack pattern is followed by each infectedhost. As a result, the traffic statistics generated by all theinfected hosts (bots) are deterministic. By exploiting this fact,we aim to incorporate entropy to quantify randomness inTTL for packets. Algorithm 2 presents the steps involved incomputing the window-based entropy for T tlP . Algorithm 2is used to compute the fingerprints for packets’ TTLs.

Algorithm 2 getWindowBasedEntropyVector: Window-BasedEntropy Calculation for the Packet-Level Feature TTL

Input: {xn}Nn=1 and W

Result: Window-based entropy vector {El}Ll=1.

1: Divide feature data in to L disjoint subsets, where L = NW

2: The lth subset, Sl = {xn}Wn=1

3: for all l →L do4: El = Entropy(Sl)5: l = l+16: Sl = Next window of size W from {xn}N

n=1

3) Applications’ Fingerprints Generation: The applicationfingerprinting phase of Algorithm 3 enumerate the steps toconvert raw dataset to its fingerprinted version, termed asthe transformed dataset denoted by W. In the first twosteps, weighted-mean vectors of packet inter-arrival times andpacket lengths are computed, denoted by �p and �t , respectively(Step 1 and 2). In third step, entropy for each window iscomputed, denoted by �e (Step 3). The transformed dataset W

is obtained by merging the transformed features from the firstthree step of algorithm into a single matrix form. Note thatthe dimension of features ( �p, �t, and �e) are identical, therefore,they can be merged to form a fingerprinted (W) features.

Fig. 3 presents applications’ packet-level fingerprints fol-lowing Algorithm 3 (Step 1 to 4). For comparison, we alsoshow the non-fingerprinted version of the data for the samefeatures in Fig. 2. We suspect that classification algorithmswould suffer from performance issues while classifyingpacket-level features shown in Fig. 2, whereas the sameclassifier would perform optimally on the finger-printed fea-tures shown in Fig. 3. Moreover, we can clearly see thatrespective types of normal traffic are finely separated in thethree-dimensional feature space. Similarly, different attacktypes are also finely separated in the feature space.

B. Mean-Shift Clustering

Given W = { �p, �t, �e}, we employ the mean-shift (MS) clus-tering approach to cluster the transformed dataset W. MS isa non-parametric feature-space analysis technique for locating

Fig. 3. Packet-level features after application fingerprinting. It is evident fromthe figure that normal traffic types (HTTP, FTP, and DNS) are clearly separatedin the three-dimensional feature space. The DDoS traffic types (unknownfingerprints) are also clearly separated, but we require further analysis toavoid false alarms from classifiers, because we still see some normal trafficin DDoS traffic (red squares).

the maxima, i.e., the modes, of a density function given adataset, and can also be called a mode-seeking algorithm [30].This is an iterative approach, and starts with an initial estimatex , where a kernel function K (xi − x) is given. This functiondetermines the weights of nearby points for the re-estimationof the mean. Typically, a Gaussian kernel on the distance to thecurrent estimate is adopted, i.e., K (xi − x) = exp−c||xi −x ||2 .The MS clustering aims to partition the traffic flows into kclusters, i.e., C = {C1, C2 . . . , Ck}, where each cluster has itsown respective cluster parameters.

The transformed dataset W is input to the unsupervised MSclustering algorithm, which assigns each datapoint a clusterassignment variable {cn}K

n=1, where K < N . The clusterassignment variable cn indicates the cluster to which thatdatapoint belongs to. The datapoints from the same applicationwill have same cluster assignment variable (cn). MS clusteringwill result into K clusters with their respective cluster parame-ters. Note that the numbers of clusters obtained in each case,i.e., for normal and attack traffic, are expected to be greaterthan one. The reason for this is that for legitimate applications,the distribution is assumed to be multi-modal. Similarly, forattack traffic, if there is more than one attack then one clusteris created for each attack type.

C. Normal Profile Generation

Profiles for normal applications are generated based ontraining data. Algorithm 3 enumerates the step for generatingnormal profiles for applications. First, application fingerprintsare obtained (Step 1 - 4). The obtained fingerprinted fea-tures W are input to the MS clustering algorithm (Step 5).We assume that limited training data are available for normaltraffic applications such as HTTP, FTP, and DNS. Using thetraining data, the K clusters obtained from MS clustering arechecked in-turn for their similarity with the training data.Based on the similarity scores, the clusters are mapped toapplications. Next, clusters are categorized into two groups:known fingerprinted cluster set (K f p) and unknown finger-printed clusters (U f p). The clusters in K f p are used to generatethe profiles of normal applications (Step 6 - 9), whereas those


Algorithm 3 Application Fingerprint-Based Normal ProfileGeneration

Input: Xp = {xn}Nn=1

Result: K f p, U f p

Application fingerprinting:1: �p=getWeightedMeansVector(L P) using Algo. 12: �t=getWeightedMeansVector(IP) using Algo. 13: �e=getWindowBasedEntropyVector(TtlP ) using Algo. 24: W = { �p, �t, �e}

Clustering:/*Apply MS clustering */

5: {c1, . . . , cN } = MS-Clustering(W)/* K clusters (both legitimate and attack) are obtainedwith respective cluster parameters. */

Normal profiles generation:6: for all k → K do7: if the training data represent the kth cluster Ck then add

Ck cluster to K f p .8: else add Ck to U f p.9: Generate normal profiles for legitimate traffic using K f p.

10: return U f p for further processing.

in U f p are further analyzed to confirm the legitimacy of theapplications’ fingerprints.

Note that our dataset consists of both legitimate (variousbenign applications) and DDoS attack types (Slowloris andflooding) traffic. The clusters for which mapping is foundcomes in K f p set of clusters. Whereas, those clusters whichare not known may contain legitimate and/or attack traffic, andhence comes under the set of unknown clusters U f p, whichneeds further analysis. The clusters in U f p may consist oflegitimate and/or attack traffic. After analyzing the clustersin U f p, the decision about their legitimacy are made.

V. FLOW-LEVEL FEATURE-BASED ATTACK DETECTION

In this section, the unknown fingerprints U f p are fur-ther analyzed using flow-level features in addition to thepacket-level fingerprints, to decide whether the data ofthe fingerprinted clusters have been generated by a legiti-mate or attack application. The reason for this further analysisthe fact that certain network attacks, such as DDoS, exhibitsimilar characteristics in packet-level features to normal traf-fic [19], [26]. For instance, the behavior of flash eventsis quite similar to DDoS behavior in terms of packet-levelfeatures. However, because DDoS attacks are controlled bya botmaster, the correlation between DDoS flows is veryhigh [26], [27]. Thus, by exploiting this fact we propose aflow-level features-based approach to identifying attacking andnormal applications.

A. Flow-Level Feature Extraction FromUnknown Fingerprints

In this subsection, we investigate the flow-level features tofurther confirm the legitimacy of the unknown fingerprinted

clusters set U f p. Here, the flow features from U f p areextracted. The flow-level features provide a holistic view oftraffic flow behavior.

We consider the following extended packet- and flow-levelfeatures:

1) Flow-Duration (Fd): The flow duration plays an impor-tant role in detecting normal and attack traffic [20].Therefore, we consider this for our proposed method.It is represented here as Fd = { f d1, . . . , f dN }, whereN is the number of flows.

2) Ratio of Source and Destination Bytes (R f ): The ratioof packets between the source and destination hosts canbe used to detect the normal activity pattern betweenthe host and the web server [20]. We denote this asR f = { f r1, . . . , f rN }.

3) Packet-Level Fingerprinted Features: Here, we reuse thepacket-level fingerprinted features ( �p) obtained usingAlgorithm 1. The inter-arrival time distribution playsan important role in detecting misbehaving applications.For example, Slowloris is an example of a DDoS attackthat keeps the connection busy for a long time and makesthe server perform computationally intensive tasks inorder to bring the web server down. This feature isdenoted as �p = { f p1, . . . , f pN }.

4) Packet-Level Fingerprinted Features: Similarly, we reuseanother fingerprinted feature (�t) to quantify the behaviorof traffic flows in terms of randomness in the TTL ofpackets. This is represented as �t = { f e1, . . . , f eN }.

The dataset for the extended traffic features can be repre-sented as

Y = {y1, · · · , yN }, (4)

where yn = { f dn, f rn, f pn, f en}. (5)

Based on the features set Y, we intend to cluster traffic flowsto detect attack and legitimate applications. Let us take a real-world example to explain the motivation behind incorporatingflow-level features. Consider two flows (one legitimate andthe other an attack) fa and fb, initiated by two different hoststargeting the same destination host at TCP port 80. If packet-level features are analyzed, such as the packet inter-arrivalIP time and packet lengths L P for the two flows, they mayexhibit similar behavior, and the anomaly detector may let bothflows pass. However, if the flow duration Fd and the ratio ofsource to destination bytes R f are analyzed, the normal flowswould exhibit a significant variation compared to the attackflow. For instance, in the case of a DDoS attack, the variationin TTL values is negligible, because they are controlled bya single built-in program, whereas the normal traffic showssignificant variations in TTL values, because it is controlledby different applications, and each application sets these valuesaccording to their requirements. Similarly, packet inter-arrivaltimes are expected to be higher and less random in the caseof a Slowloris attack.

B. DPMM-Based Clustering

Here, the extended feature set Y is input into the proposedDPMM-based clustering method to identify legitimate/attack


Fig. 4. Graphical model of the DPMM. Nodes represent random variables,edges are dependencies, and plates are replications. The shaded node repre-sents the observations.

applications. Rather than relying on heuristics (e.g., as shownin Fig. 6, TTL values could be used to detect attack traffic)for attack application detection, we consider DPMM-basedclustering for detecting different types of normal and attacktraffic applications. The DPMM models are flexible in detect-ing unique patterns in a dataset using clustering. Our goal hereis to detect attacks of different types. Of course, in this studywe only restricted ourselves to Slowloris and flooding attacks,but in future other types of attack may show-up forming a newcluster. Under such situations, approaches relying on heuristicsmay fail. DPMM which comes under the umbrella of Bayesiannonparametric statistics are flexible in allowing the formationof new clusters as more data are observed. Thus, unknownattack types of attacks or normal traffic applications could beeffectively detected using DPMM.

In practice, we do not know how many traffic applica-tions (legitimate and attack) are active in a network, andtherefore we would like to determine this from the observeddata. Dirichlet processes can be employed to obtain a mix-ture model (DPMM) with infinite components (applications),which can be viewed as taking the limit of the finite mixturemodel for K → ∞. In the following, we briefly discussthe DPMM framework which is unsupervised and nonpara-metric model for clustering. However, for detailed overviewabout Dirichlet processes and respective models, please referto [22], [23], [31], [32], [40], and [41].

A graphical model of the DPMM is presented in Fig. 4.Here, α is the scalar hyperparameter of the DPMM, whichaffects the number of clusters obtained, and Zn is the clusterassignment variable such that the feature point yn belongs tothe kth cluster. The larger the value of α, the more clusters,and the smaller the value of α the fewer clusters. Note thatthe value of α indicates the strength of belief in Go, whereGo is the base distribution. A large value means that mostof the samples will be distinct, and have values concentratedaround Go. In addition, θk are the cluster parameters, wherek ∈ {1, 2, . . . }. The mixture distribution can be defined asp(yn) = ∑∞

k=1 πk p(·|δθ∗k), where the mixing weights are

denoted by πk and the mixing components (clusters) arerepresented by p(·|δθ∗

k). Here, δθ∗

kis employed as compact

notation for δ(θ = θ∗k ), which is a delta function that takes a

value of 1 if θ = θ∗k and 0 otherwise.

We represent the generative model for the DPMM using thestick-breaking process [40]. Consider two infinite collections

of independent random variables, Vki.i.d .∼ Beta(1, α) and

θ∗k

i.i.d .∼ Go, k = {1, 2, . . . }. The stick-breaking process ofG is given by

πk = Vk

k−1∏

j=1

(1 − Vj ), (6)

G =∞∑

k=1

πkδθ∗k, (7)

where V = {V1, V2, . . .} with V0 = 0. The mixing weights{πk} are given by breaking a stick of unit length into infinitelysmall segments. In the DPMM, the vector π represents theinfinite vector of mixing weights, and {θ∗

1 , θ∗2 , . . . } are the

atoms, which correspond to mixing components. Because Zn

is the cluster assignment random variable for the feature pointyl , the data for the DPMM are generated as follows:

1) Draw πk |α i.i.d .∼ Beta(1, α), k ∈ {1, 2, . . . }.2) Draw θ∗

ki.i.d .∼ G0, k ∈ {1, 2, . . . }.

3) For the feature point yn , do:(a) Draw Zn|{v1, v2, . . . } ∼ Multinomial(π).(b) Draw yn|zn ∼ p(yn|θ∗

zn).

Here, we restrict ourselves to a DPMM for which the observ-able data are drawn from a normal distribution, and where thebase distribution for the DP is the corresponding conjugatedistribution.

Because Dirichlet processes are nonparametric, we cannotuse the EM algorithm to infer the random variables {Zn}(which store the cluster assignments) for the proposed DPMMmodel, owing to the fact that EM is generally used forinferences in a mixture model, but here G is nonparametric,making the application of EM difficult. Hence, in order to esti-mate these assignment variables in the paradigm of Bayesiannonparametrics, a sampling-based approach based on Markovchain Monte Carlo (MCMC) is leveraged to cluster Y.

We propose using the collapsed Gibbs samplingapproach [31], which requires selecting a base distributionGo that is a conjugate prior of the generative distributionp(yn|θ∗

zn) in order to analytically solve and sample directly

from p(Zn|Z−n, Y). The posterior distribution under ourmodel is

P(Z ,�,π , α) ∝ P(Y|Z ,�)P(�|G0)

×N∏

n=1

P(zn |π)P(π |α)P(α) (8)

By integrating-out certain parameters, the posterior distribu-tion is given by [31] as

P(zn =k|Z−n, Y,�,π , α) ∝ P(yn |zn,�)P(zn |Z−n, α) (9)

For the first term in the above equation, we use the multivariateStudent-t distribution, i.e., tνn−2[ �μn,

�n (κn+1)κn (νn−2) ], because we

chose the inverse-Wishart as a conjugate prior for k and thenormal distribution for �μk , where the second term is called aChinese restaurant process, and is given by

P(zn = k|Z−n) =

⎧⎪⎨

⎪⎩

mk

n − 1 + α, if k ≤ K+,

α

n − 1 + α, if k > K+

(10)


TABLE II

DATASETS

Algorithm 4 Extended Traffic Feature-Based Normal andAttack Flow Detection

Input: Y = {yn}Nn=1, Entthr, and Ithr

Result: Attack detection

DPMM clustering:1: {C1, . . . , CK } = DPMM-Clustering(Y)

Attack and normal traffic detection:2: for all k →K do3: Ck,�t = Avg(Ck(�t))4: Ck,R f = Avg(Ck(R f ))5: Ck, �p = Avg(Ck( �p))6: if Ck,�t < Entthr and Ck,R f = 0 then7: if Ck, �p > Ithr then Label Ck as Slowloris attack.8: else Label Ck as flooding attack.9: else Label Ck normal traffic application and update the

corresponding normal traffic profiles.

where Z−n = Z/zn , K + is the number of classes containing atleast one data point, and mk = ∑N

n=1 I (zi = k) is the numberof data points in the class k. The steps involved in collapsedGibbs sampling are enumerated below:

1) Initialize the cluster assignments {zn} randomly.2) Repeat the following until convergence:

(a) Randomly select yn .(b) Fix all other zn for every n = n: Z−n .(c) Sample zn ∼ p(Zn|Z−n, Y) from (10).(d) If zn > K then update K = K + 1.

After convergence, the cluster labels Zn are assigned andthe cluster parameters are computed.

C. Clustering-Based Normal/Attack Traffic Identification

Algorithm 4 presents the steps required to further analyzeflows and detect whether they are normal or attack flows.The algorithm takes the feature set Y, unknown fingerprintedclusters set from Algorithm 3, the threshold for randomnessin flows’ TTL values, and the threshold for the flows’ packetinter-arrival times as input.

Thresholds (Entthr and Ithr) and the ratio of source todestination bytes (Ck,R f ) play an important role in detectingcertain types of attacks. Since we have attack dataset forDDoS, we compute thresholds from clusters formed by theattack traffic. We observe that most benign applications suchas VoIP, P2P, Game, TCP/UDP show reasonable variations inentropy in TTL (all applications’ entropy in TTL are greaterthan 1). Following this observation, we set the threshold the

TABLE III

ISCX DATASET DESCRIPTION

Entthr = 1.2. Similarly, due to the inherent nature of theSlowloris DDoS attack, the inter-arrival time between packetsare higher than that of benign applications, as shown in Fig. 6.Of course, traffic inter-arrival time for various benign trafficapplications could be different ranging from micro- to milli-seconds, however, for the Slowloris attack, the inter-arrivaltimes mostly lies above 4 seconds which is significantlygreater than all benign applications. Due to this, we set thethreshold Ithr = 4 following the cluster parameters of benignand attack traffic.

First, the DPMM-based clustering algorithm is applied toobtain clusters {cn}N

n=1. The clusters obtained from DPMMare analyzed to classify them as attack or normal traffic appli-cations. Then, for each cluster the averages of the followingthree features are computed: entropy in TTL values, ratioof source to destination bytes, and flow packet inter-arrivaltime (steps 3-5). If a cluster’s entropy in TTL is less than thethreshold and the ratio of source to destination bytes is zero,this means that the referenced cluster likely belongs to attacktraffic. The attack types are checked in step 7, i.e., to detectwhether the attack type is Slowloris or flooding (steps 6–9).If a cluster is labeled as normal traffic, then the normal profilegenerated in Section V is updated accordingly.

VI. EVALUATION

In this section, we perform extensive experiments to com-prehensively evaluate the proposed approach for normal profilegeneration and attack traffic identification. To evaluate the pro-posed approach for application classification, we manually setsome identified applications as unknown in the experiments.

A. Datasets

In this study, we evaluate the performance of the proposedapproach using real network traffic measurements, the ISCXintrusion detection evaluation dataset [34], CAIDA2007 [35],and CAIDA2016 [36]. The ISCX dataset consists of normaland attack traffic (see Table II). The normal dataset traces


TABLE IV

SUMMARY OF THE MEASURED TRAFFIC DATA

consist of real traffic for HTTP, SMTP, SSH, IMAP, POP3,and FTP. Each trace is exactly 24 hours long. The total ISCXdataset consists of 84.4 GB of data, comprising differentscenarios. Each scenario, along with the respective traffic size,is listed in Table III. In our experiments, we utilized thescenarios S1, S4, and S5 to evaluate the proposed approach.

The CAIDA2007 [35] dataset contains approximately onehour of anonymized traffic traces from a DDoS attack onAugust 4, 2007. This type of DDoS attack attempts toblock access to the targeted server by consuming computingresources on the server and consuming all of the bandwidthof the network connecting the server to the Internet. The totalsize of the dataset is 21 GB. These traffic traces recordedonly attack traffic to the target and responses from the target,whereas the legitimate traffic has been removed as far aspossible. The CAIDA2016 dataset [36] consists of anonymizedpassive traffic measurements from passive monitors in 2016.It contains traffic traces from the ’equinix-chicago’ high-speed monitor. Normal traffic datasets of CBR and VoIPtraffic measurements are from the WiBro network in Seoul,Korea [38]. Similarly, normal traffic application behaviorof P2P and the War-of-Warriors (WoW) game are obtainedfrom [37] and [39], respectively.

For multi-modal normal profile generation, we utilize datafrom both the CAIDA2016 and ISCX normal activity scenarios(S1 and S2 of Table III). For attack traffic datasets, we useHTTP-based DDoS (S4 from Table III) and IRC-based DDoSattack data (S5 from Table III) from ISCX. We also usethe DDoS dataset from the CAIDA2007 dataset. For theS4 and S5 scenarios in Table III, we consider the traffic frommalicious hosts targeting the server, whereas for normal trafficwe utilized traffic from all hosts to the targeted server. Finally,normal and attack datasets are mixed together to evaluate theproposed approach for traffic classification-based normal andattack traffic detection.

B. Implementation Considerations

The traffic composition of the ISCX traffic dataset is shownin Table IV. The dataset consists of three classes (HTTP: about87%, FTP: 6%, attack: 7%). We conducted the experiments ona dual core i7-4790 CPU with a speed of 3.60 GHz in eachcore and installed a memory of 16 GB.

1) Fingerprinting Parameter Setting: Because we employa histogram method for fingerprinting applications, we needto set the number of bins F and window sizes W . We setF = 50 for all experiments. The reason for fixing this for all

TABLE V

FINGERPRINTING PARAMETERS

experiments is that we analyzed from real traces that F = 50can fully represent the variations in packet-level features.However, we set W according to each application’s proportion,i.e., for different applications W is set depending on the sizeof that application with respect to other applications in thewhole dataset. Table V presents the number of instances ofeach application with the respective window sizes used in ourexperiments.

2) Hyperparameter Setting for DPMM Clustering: In oursimulations, we follow the generative model in (6) and (7)to generate data with initial parameters from hyperparame-ters G0. In DPMM, we need to set the hyperparameters αand G0 = {�μ0, κ0,�

−10 , ν0}, as shown in Fig. 4. Here,

α encodes the number of clusters, as implied by (10). Whenα is larger, the probability of assigning an observation to anunrepresented class is greater than for a represented class.Hence, the value of α is set according to the belief in thenumber of clusters in the dataset. In our experiments, we setthis to 1. For the generative model hyperparameters G0,we need to set the corresponding conjugate hyperparameters ofa base distribution [41]. We see that the mean vector of a classfollows the Gaussian distribution with mean �μ0 and covariancematrix k/κ0. Thus, the two required hyperparameters are�μ0 and κ0, which reflect the mean locations of clusters.If κ0 is greater, then the clusters are located close to eachother, and conversely if κ0 is smaller, then the clusters willhave a larger variation in k/κ0, and hence the clusters aresparser. Consequently, �μ0 and κ0 are selected accordinglybased on the prior knowledge about the observations. Thehyperparameters of the inverse-Wishart distribution, which isthe conjugate prior of the multivariate normal, are �−1

0 and ν0.Here, �−1

0 is the scale matrix for the distribution, and ν0is the degree of freedom. The hyperparameters are set toG0 = {�−1

0 , ν0, �μ0, κ0} = {Identity(3), 4, Zeros(3, 1), 0.5}.For details, we refer the reader to [41].

C. Application Fingerprinting Analysis

Here, we demonstrate the efficiency of the proposed appli-cation fingerprinting method, and the corresponding classi-fication results on the fingerprints. Unlike directly applyinga classification algorithm on the non-fingerprinted packet-level features, which may result in a higher misclassificationrate, we first take the packet-level features and perform theproposed fingerprinting method to obtain applications’ finger-prints, and then these fingerprints are input into the classifica-tion algorithm (MS) in order for it to perform optimally withminimal misclassifications.


Fig. 5. Fingerprints for various traffic applications.

Fig. 5 shows the fingerprints generated for various traf-fic applications. Note that the fingerprints are finely sepa-rated in the three-dimensional feature space, which leads tooptimal classification results. It is observed that HTTP andgam3 (WoW) traffic show significant variations in their trafficpatterns. For instance, in Fig. 5 there seem to be two clustersfor the WoW game (purple-stars) traffic. This variation in thegame traffic is due to game traffic dynamics: it may sometimesproduce misclassifications when input to a classifier.

We leverage the Kullback–Leibler (KL) divergence to quan-tify the distances among various fingerprints generated fromapplications. To quantify the efficiency of our fingerprintingmethod, we compute the KL divergence for both the finger-printed and the non-fingerprinted application data. The KLD isa measure of how one probability distribution, such as a trafficapplication, diverges from a second. The KL divergence forthe multivariate case (packet-level features) is represented asfollows:

KLD = 1

2(trace(−1

2 1) + (�μ2 − �μ1)T −1

2 (�μ2 − �μ1)

− ln(det1

det2− D)),

where is a covariance matrix for a cluster with D fea-tures, and �μ represents the cluster mean. Fig. 6 shows KLdivergences and the corresponding classification accuracies forboth the fingerprinted and non-fingerprinted applications. Thelarger the KL divergence between two distributions (clusters),the easier it is to separate them. From Fig. 6, the KLdivergence for fingerprinted applications is seen to be higherthan for those that are non-fingerprinted. The correspondingclassification accuracy against each KL divergence for pairsof clusters (applications) is shown to be significantly higherthan for the proposed fingerprinting method. We observe aclassification accuracy of up to 99.8% for CBR and WoWtraffic. This is because of the fact that the KL divergence(separation between two clusters) is large between these. Forfurther verification, we observe that the distance betweenthe CBR (red-diamond) and the WoW (purple-star) clustersin Fig. 5 is significantly large (around 4), which results in ahigher classification accuracy (99.8%). However, the KL diver-gence between the FTP (red-circle) and the P2P (green-star)clusters is comparatively low (2.51), owing to which the

Fig. 6. KL divergence and accuracy for proposed application fingerprintingand non-fingerprinting of applications.

classification accuracy is also low (75.6%), but it is still sig-nificantly higher than in the corresponding non-fingerprintedcase (52.8%).

D. Flow-Level Feature-Based Traffic Identification

In this subsection, we present a performance evaluation ofthe proposed method in terms of correctly identifying attackand normal traffic applications. It should be noted that the coreof Algorithm 4 is DPMM clustering. Therefore, we presenthere the flow-level traffic identification accuracy for DPMM,and then we compare the performance with four classifica-tion approaches. For comparison, we take variational Bayesinference (VB), mean-shift (MS), Gaussian mixture modelclustering (GMM), and K-means. The experiments performedin this subsection consist of a dataset containing 2,198 data-points including normal and attack traffic. There are total eighttraffic applications, i.e., six normal and two attack traffic.

The variational Bayes (VB) inference approach is basedon variational methods, which convert inference problemsinto optimization problems [40]. The main idea that governsvariational inference is that it formulates the computation of amarginal or conditional probability in terms of an optimizationproblem, which depends on the number of free parameters,i.e., variational parameters. We use the same experimentalsetting used in [40]. The prior knowledge for the numberof clusters in set to 10, i.e., K = 10. The α = 1 whichstates that the clusters are equiprobable. All the distributionsare assumed to be Gaussian in optimization-based varia-tional inference. Here, the variational parameters give themarginal or conditional probabilities of interest. Unlike theproposed DPMM approach, which is based on a Markov-based sampling approach, the VB approach is based on anoptimization approach.

Gaussian mixture model (GMM)-based clustering is a prob-abilistic model for representing normally distributed clusterswithin the overall data [32]. In general, mixture models do notrequire knowledge of which cluster a data point belongs to,allowing the model to learn the cluster centers automatically.Because clustering assignments are not known, this constitutesa form of unsupervised learning. Since GMM-based clusteringis parametric in nature, i.e, the number of clusters need to beprovided as prior, therefore, we set the number of cluster toK = 8. This dictates that the classifier should partition theprovided dataset into 8 clusters. Even though, we provide the


TABLE VI

ATTACK AND NORMAL TRAFFIC APPLICATION IDENTIFICATION RESULTS

Fig. 7. Traffic application identification accuracy/misclassification rate fornormal and attack traffic for the proposed DPMM approach compared withvariational Bayes (VB), Gaussian mixture model (GMM), mean-shift (MS),and K-means.

number of clusters in advance, Still GMM underperforms ascompared to the DPMM.

Finally, the K-means algorithm is a well-known clusteringapproach, which takes the dataset and number of clustersK as input and partitions the input dataset into K clusters,while each cluster has its own cluster center [29]. Similar toGMM-based clustering, we have set the number of clustersto K = 8 while performing experiments because K-means isalso a parametric clustering approach.

Fig. 7 presents the overall traffic application identificationaccuracies and misclassifications. By the clustering accuracy,we mean the overall accuracy in correctly detecting normal(HTTP, FTP, and DNS) and DDoS attack application traffic.In other words, this refers to the total number of correctlyidentified data points over the total number of data points.From Fig. 7, we observe that the proposed DPMM trafficclustering accuracy is significantly higher 97.5% than those ofthe other approaches. However, the accuracy of the VB clus-tering approach lies at around 91.8%, whereas all the otherapproaches struggle to correctly detect the traffic applications.The MS, GMM, and K-means approaches suffer from lowdetection rates and higher misclassifications. The misclassifi-cation rates for all of these approaches lie below 60%. Overall,the proposed DPMM approach performs significantly betterin detecting normal traffic. It is evident from Fig. 7 that theproposed DPMM misclassification rate is around 2.5% for alltraffic applications, whereas for VB it is around 8.2%. Table VIprovides comprehensive results regarding the accuracy forvarious traffic applications.

Fig. 8 illustrates the normal and aggregated attack trafficdetection accuracies. It is evident from Fig. 8 that the proposedDPMM approach outperforms all other approaches in accu-rately detecting normal traffic applications. However, VB andother approaches struggle to detect HTTP traffic. Note thatin Figs. 7 to 8, the results relate to the identification the clusterset U f p of unknown traffic applications from Algorithm 3.

Fig. 8. Normal and aggregated attack traffic identification of the proposedDPMM approach, and the VB, MS, GMM, and K-means approaches.

Fig. 9. Traffic application identification accuracy comparison for variationalBayes (VB), Gaussian mixture model (GMM), mean-shift (MS), and K-means.

TABLE VII

NORMAL TRAFFIC APPLICATION IDENTIFICATION ACCURACY

We observed that Algorithm 4, which is based on DPMM,performs significantly better than the other approaches indetecting normal and attack traffic applications, with an overallaccuracy of over 97%.

Fig. 9 shows the per-application accuracy results for var-ious algorithms. It is evident that the proposed DPMM-based approach accurately identifies normal traffic (HTTP,FTP, and DNS) with 100% accuracy. However, the attacktraffic is classified with very few misclassifications. Overall,the proposed DPMM outperforms all other algorithms, with97.5% accuracy results. Note that the VB approach strugglesto identify normal traffic applications. However, GMM andK-means perform worse in detecting Slowloris attacks. TheMS, GMM, and K-means approaches perform better in identi-fying DDoS attacks, but at the same time traffic from Slowlorisis mislabeled as flooding attack traffic.

Fig. 10 illustrates attack-only traffic detection accuracies.We see that both the DPMM and VB approaches performwell in detecting attack traffic. Even VB performs well in


Fig. 10. Attack-only traffic detection accuracy of proposed the DPMMapproach and the VB, MS, GMM, and K-means approaches.

detecting Slowloris attacks compared to the proposed DPMM.However, VB struggles to detect normal traffic applicationsand mislabels them as attack traffic. Owing to this, the overallaccuracy of DPMM is significantly better than that of theVB approach. From Fig. 10, GMM and K-means performworse in detecting Slowloris attacks. This is due to the fact thatSlowloris is a stealthy attack, and it is considerably challengingfor anomaly detectors to precisely detect such attacks.

E. Flow-Level Feature-Based Normal Traffic Identification

Here, we demonstrate the normal application identificationaccuracy for the cluster set U f p of unknown fingerprints.The proposed DPMM-based clustering method outperformsall other approaches in classifying various “unknown” finger-printed applications, as shown in Table VII. The overall classi-fication accuracy from Algorithm 4 for various applications is96.5%, whereas all other approaches struggle to identify someapplications. For instance, GMM performs highly effectivelyon many applications but is unable to identify WoW game traf-fic. The reason for this is the dynamic behavior of game appli-cation over time. However, the proposed approach identifiesall applications with an accuracy of between 71% and 100%.It is also observed that all approaches, i.e., DPMM, VB, MS,GMM, and K-means, classify the CBR traffic with 100%accuracy. The reason for this lies in the fact that the CBRtraffic is finely separated in the three-dimensional featurespace, as shown in Fig. 5, and after fingerprinting the KLdivergence is higher between CBR and other applications.

F. Packet- and Flow-Level Features Based Fingerprinting

Here, we investigate the performance of our approach basedon the fingerprints generated from the packet- and flow-level traffic features. In this set of experiments, we mergepacket- and flow-level features to generate fingerprints fordifferent applications, and then analyze the performance usingDPMM clustering approach. We are interested to observe theperformance of our approach when application fingerprints aregenerated from both the packet- and flow-level features ratherthan fingerprints generated using only packet-level features.

We used total eight traffic applications, six were normaltraffic and two of them were attack traffic. The normal trafficapplications include HTTP, UDP, VoIP, Game, P2P, and TCP,whereas attack traffic applications are DDoS attacks. From theexperimental results, we observe that DPMM achieves 66.6%accuracy in detecting normal traffic applications, whereasthe detection accuracy for DDoS attacks are 70% and 92%,

respectively. The overall accuracy of the proposed approachin detecting normal and attack traffic lies at 70.25%. Thisproves our claim in Section III-B that by merging packet- andflow-level features and applying classification algorithm onthe merged feature may result in misclassifications, i.e., trafficfrom different applications may result in a same cluster.

VII. CONCLUSION

In this paper, we proposed a novel framework for gen-erating normal traffic profiles based on application finger-printing. Moreover, we proposed a Bayesian nonparametrictraffic application clustering approach to address the prob-lem of unknown application detection. The proposed methodintroduced two novel approaches. First, to fingerprint trafficapplication behavior based on packet-level traffic featuresand derive a multi-modal probability distribution to generatenormal traffic profiles. Second, based on the Bayesian nonpara-metric clustering approach, the unknown traffic applicationsare identified either as DDoS attack traffic or normal traffic.We had shown that the proposed method classifies unknownnormal and DDoS attack traffic applications with accuraciesof 100% and 97%, respectively. In this work, we analyzed andsuccessfully detected HTTP, FTP, and DNS applications forthe case of normal traffic applications, and DDoS (floodingusing IRC botnet or Slowloris) attack applications for theattack traffic scenario.

In future work, we plan to explore and include flash eventsin our proposed framework. It should be noted that flashevents are quite similar in behavior to DDoS attacks. It wouldbe worthwhile studying the behavior of flash events anddistinguishing them from DDoS attacks while utilizing theproposed framework.

ACKNOWLEDGMENTS

The authors would like to thank all the anonymous review-ers for their valuable feedback.

REFERENCES

[1] C. Pettey. (2015). The Internet of Things and the enterprise,Gartner, Stamford, CT, USA. Accessed: Feb. 15, 2018. [Online].https://www.gartner.com/smarterwithgartner/the-internet-of-things-and-the-enterprise/

[2] S. Hilton. (Oct. 26, 2016). Dyn analysis summary of Friday October 21attack. Gartner. [Online]. Available. https://dyn.com/blog/dyn-analysis-summary-of-friday-october-21-attack/

[3] B. Krebs. (Oct. 2016). Who makes the IoT things under attack?Krebs Security. Accessed: Mar. 25, 2018. [Online]. Available: https://krebsonsecurity.com/2016/10/who-makes-the-iot-things-under-attack/

[4] M. Crotti, M. Dusi, F. Gringoli, and L. Salgarelli, “Traffic classificationthrough simple statistical fingerprinting,” ACM SIGCOMM Comput.Commun. Rev., vol. 37, no. 1, pp. 5–16, Jan. 2007.

[5] M. H. Bhuyan, D. K. Bhattacharyya, and J. K. Kalita, “A multi-stepoutlier-based anomaly detection approach to network-wide traffic,” Inf.Sci., vol. 348, pp. 243–271, Jun. 2016.

[6] B. Anderson and D. McGrew, “Machine learning for encrypted malwaretraffic classification: Accounting for noisy labels and non-stationarity,”in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,Apr. 2017, pp. 1723–1732.

[7] A. H. Hamamoto, L. F. Carvalho, L. D. H. Sampaio, T. Abrão,and M. L. Proença, Jr., “Network anomaly detection system usinggenetic algorithm and fuzzy logic,” Expert Syst. Appl., vol. 92,pp. 390–402, Feb. 2018.

[8] G. Fernandes, Jr., L. F. Carvalho, J. Rodrigues, and M. L. Proença, Jr.,“Network anomaly detection using IP flows with principal componentanalysis and ant colony optimization,” J. Netw. Comput. Appl., vol. 64,pp. 1–11, Apr. 2016.


[9] V. Matta, M. Di Mauro, and M. Longo, “DDoS attacks with randomizedtraffic innovation: Botnet identification challenges and strategies,” IEEETrans. Inf. Forensics Security, vol. 12, no. 8, pp. 1844–1859, Aug. 2017.

[10] M. E. Ahmed and H. Kim, “DDoS attack mitigation in Internet of Thingsusing software defined networking,” in Proc. IEEE 3rd Int. Conf. BigData Comput. Service Appl., Apr. 2017, pp. 271–276.

[11] Z. Tan, A. Jamdagni, X. He, P. Nanda, and R. P. Liu, “A systemfor denial-of-service attack detection based on multivariate correla-tion analysis,” IEEE Trans. Parallel Distrib. Syst., vol. 25, no. 2,pp. 447–456, Feb. 2014.

[12] U. Ben-Porat, A. Bremler-Barr, and H. Levy, “Vulnerability of networkmechanisms to sophisticated DDoS attacks,” IEEE Trans. Comput.,vol. 62, no. 5, pp. 1031–1043, May 2013.

[13] J. François, I. Aib, and R. Boutaba, “FireCol: A collaborative protectionnetwork for the detection of flooding DDoS attacks,” IEEE/ACM Trans.Netw., vol. 20, no. 6, pp. 1828–1841, Dec. 2012.

[14] T. P. Fries, “Classification of network traffic using fuzzy clusteringfor network security,” in Proc. Ind. Conf. Data Mining, Jul. 2017,pp. 278–285.

[15] B. Stone-Gross et al., “Your botnet is my botnet: Analysis of abotnet takeover,” in Proc. 16th ACM Conf. Comput. Commun. Secur.,Nov. 2009, pp. 635–647.

[16] S. Kim, S. Lee, G. Cho, M. E. Ahmed, J. P. Jeong, and H. Kim,“Preventing DNS amplification attacks using the history of DNS querieswith SDN,” in Proc. Eur. Symp. Res. Comput. Secur. (ESORICS),Sep. 2017, pp. 135–152.

[17] C. Y. Cho, J. Caballero, C. Grier, V. Paxson, and D. Song, “Insights fromthe inside: A view of Botnet management from infiltration,” in Proc.3rd USENIX Conf. Large-Scale Exploits Emergent Threats, Botnets,Spyware, Worms, More (USENIX LEET), Apr. 2010, p. 1.

[18] V. L. Thing, M. Sloman, and N. Dulay, “A survey of bots used fordistributed denial of service attacks,” in Proc. Int. Inf. Secur. Conf.,May 2007, pp. 229–240.

[19] J. Jung, B. Krishnamurthy, and M. Rabinovich, “Flash crowds anddenial of service attacks: Characterization and implications for CDNsand Web sites,” in Proc. 11th Int. Conf. World Wide Web, May 2002,pp. 293–304.

[20] M. E. Ahmed, H. Kim, and M. Park, “Mitigating DNS query-basedDDoS attacks with machine learning on software-defined networking,”in Proc. IEEE Mil. Commun. Conf. (MILCOM) Oct. 2017, pp. 11–16.

[21] M. E. Ahmed, J. B. Song, and Z. Han, “Mitigating malicious attacksusing Bayesian nonparametric clustering in collaborative cognitive radionetworks,” in Proc. IEEE Global Commun. Conf. (GLOBECOM),Dec. 2014, pp. 999–1004.

[22] A. Ali, M. E. Ahmed, M. J. Piran, and D. Y. Suh, “Resource optimiza-tion scheme for multimedia-enabled wireless mesh networks,” MDPISensors, vol. 14, no. 8, pp. 14500–14525, Aug. 2014.

[23] M. E. Ahmed, J. S. Kim, R. Mao, J. B. Song, and H. Li, “Distributedchannel allocation using kernel density estimation in cognitive radionetworks,” ETRI J., vol. 34, no. 5, pp. 771–774, Oct. 2012.

[24] Y. Gu, A. McCallum, and D. Towsley, “Detecting anomalies in net-work traffic using maximum entropy estimation,” in Proc. 5th ACMSIGCOMM Conf. Internet Meas. Oct. 2005, p. 32.

[25] A. Scherrer, N. Larrieu, P. Owezarski, P. Borgnat, and P. Abry, “Non-Gaussian and long memory statistical characterizations for Internettraffic with anomalies,” IEEE Trans. Depend. Sec. Comput., vol. 4, no. 1,pp. 56–70, Jan./Mar. 2007.

[26] S. Behal and K. Kumar, “Detection of DDoS attacks and flash eventsusing novel information theory metrics,” Comput. Netw., vol. 116,pp. 96–110, Apr. 2017.

[27] S. Yu, W. Zhou, W. Jia, S. Guo, Y. Xiang, and F. Tang, “Discriminatingddos attacks from flash crowds using flow correlation coefficient,” IEEETrans. Parallel Distrib. Syst., vol. 23, no. 6, pp. 1073–1080, Jun. 2012.

[28] M. E. Ahmed, D. I. Kim, J. Y. Kim, and Y. Shin, “Energy-arrival-awaredetection threshold in wireless-powered cognitive radio networks,” IEEETrans. Veh. Technol., vol. 66, no. 10, pp. 9201–9213, Oct. 2017.

[29] D. MacKay, Information Theory, Inference and Learning Algorithms.Cambridge, MA, USA: Cambridge Univ. Press, 2003.

[30] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 17, no. 8, pp. 790–799, Aug. 1995.

[31] F. Wood and M. J. Black, “A nonparametric Bayesian alternative to spikesorting,” J. Neurosci. Methods, vol. 173, no. 1, pp. 1–12, Aug. 2008.

[32] B. S. Everitt and D. J. Hand, Finite Mixture Distributions. London, U.K.:Chapman Hall, 1981, pp. 790–799.

[33] J. Zhang, C. Chen, Y. Xiang, W. Zhou, and A. V. Vasilakos, “An effectivenetwork traffic classification method with unknown flow detection,”IEEE Trans. Netw. Service Manage., vol. 10, no. 2, pp. 133–147,Jun. 2013.

[34] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, “Towarddeveloping a systematic approach to generate benchmark datasets forintrusion detection,” Comput. Secur., vol. 31, no. 3, pp. 357–374,May 2012.

[35] (2007). The CAIDA UCSD DDoS Attack. [Online]. Available:http://www.caida.org/data/passive/ddos20070804_dataset.xml

[36] (2016). CAIDA Dataset. [Online]. Available: https://www.caida.org/data/

[37] The Kaist/Wibro Dataset (V. 2008-06-04). Accessed: Apr. 20, 2018.[Online]. Available: https://crawdad.org/kaist/wibro/20080604/

[38] The Snu/Wow_Via_Wimax Dataset (V. 2009-10-19). Accessed:Apr. 22, 2018. [Online]. Available: https://crawdad.org/snu/wow_via_wimax/20091019/

[39] The Snu/Bittorrent Dataset (V. 2011-01-25). Accessed: Apr. 25, 2018.[Online]. Available: https://crawdad.org/snu/bittorrent/20110125/

[40] D. Blei and M. I. Jordan, “Variational inference for Dirichlet processmixtures,” Int. Soc. Bayesian Anal., vol. 1, no. 1, pp. 121–144, 2006.

[41] K. P. Murphy, “Conjugate Bayesian analysis of the Gaussiandistribution,” Univ. Brit. Columbia, Vancouver, BC, Canada,Tech. Rep., 2007. [Online]. Available: https://www.cs.ubc.ca/~murphyk/Papers/bayesGauss.pdf

Muhammad Ejaz Ahmed received the M.S. degreein information technology from the National Uni-versity of Sciences and Technology, Islamabad,Pakistan, in 2011, and the Ph.D. degree from theDepartment of Electronics and Radio Engineer-ing, Kyung Hee University, South Korea, in 2014.He is currently with the Department of Software,Sungkyunkwan University, and also with the Data61,Commonwealth Scientific and Industrial ResearchOrganization, Sydney, NSW, Australia. His researchinterest focuses on network security, malwaredetection, and applied machine learning.

Saeed Ullah received the B.S. degree in informationtechnology from IBMS/CS, Peshawar, in 2006, andthe M.S. degree in information technology fromthe National University of Sciences and Technologyin 2010. He is currently pursuing the Ph.D. degreewith the Department of Computer Science and Engi-neering, Kyung Hee University, South Korea. Hisresearch interests include multimedia communica-tion, scalable video streaming, routing, in-networkcaching, and future Internet.

Hyoungshick Kim received the B.S. degreefrom the Department of Information Engineering,Sungkyunkwan University, in 1999, the M.S. degreefrom the Department of Computer Science, KAIST,in 2001, and the Ph.D. degree from the ComputerLaboratory, University of Cambridge, in 2012. He iscurrently an Assistant Professor with the Depart-ment of Computer Science and Engineering, Collegeof Software, Sungkyunkwan University. After com-pleting his Ph.D., he was a Post-Doctoral Fellowwith the Department of Electrical and Computer

Engineering, The University of British Columbia. He previously worked atSamsung Electronics as a Senior Engineer from 2004 to 2008. He also servedas a member of DLNA and Coral standardization for DRM interoperabilityin home networks. His current research interest is focused on usable securityand security engineering.

statistical application fingerprinting for ddos attack mitigation · 2019-12-26 · ahmed et al.:...

Documents