botyacc: uniﬁed p2p botnet detection using behavioural...

Botyacc: unified P2P botnet detectionusing behavioural analysis and graph analysis

Shishir Nagaraja

School of Computer ScienceUniversity of Birmingham

Edgebaston, Birmingham B15 2TT, [email protected]

Abstract. We propose a novel technique for detecting P2P botnets. Detection isbased on two working principles. First, we exploit a fundamental property of bot-net design: peer-to-peer connectivity topologies are fundamental to botnet survivabil-ity. Second, we use traffic-flow pattern analysis to capture traffic similarity within abotnet. Our work unifies graph-theoretic detection with behavioural detection into asingle technique. We carried out evaluation over live P2P botnet traffic and show thatthe resulting algorithm can localise the majority of bots with low false-positive rate.

1 Introduction

The detection and isolation of peer-to-peer (P2P) botnets is an ongoing problem.P2P architectures are attractive as they offer low end-to-end routing delays and pro-vide robustness against botnet response mechanisms by decentralising importancethroughout the network.In response to the proliferation of P2P botnets, many researchers have proposed theuse of machine learning techniques. Essentially, these are partitioning tools whichconvert a dataset into clusters of similar data points under some definition of simi-larity. However, the context of statistical botnet detection fundamentally differs fromnon-security applications: the context is adversarial and the attacker controls the dataof interest. Therefore, partitioning algorithms leveraging traffic similarity require spe-cial statistical properties.Specifically, that cluster boundaries are precise — approximate boundaries are notsufficient. Otherwise, botnets can exploit this weakness to “blend-in” with legitimatetraffic clusters. We also require that the cluster definition is robust — the property thatresists the addition of botnet points to non-botnet clusters. Current botnet detectiontechniques do not offer these properties.To enable precise and robust characterisation of the legitimate data subspace (clus-ters), our approach is to leverage the fundamental design characteristic of the botnet:its P2P communication architecture. Botmasters use P2P architectures for increasedrobustness against targeted counter-attacks on the botnet, however they are also apoint of detection. However, naively flagging all P2P networks as malicious is not theway forward.Our approach is to unify two well understood principles of botnet detection (P2Pconnectivity and traffic similarity) into a single algorithm underlying out detectiontechnique. This results in high detection accuracy as well as evasion resistance prop-erties. At the core of our technique is a novel Markovian diffusion process defined

over input traffic traces, that leverages patterns in connectivity as well as flow statis-tics. Evading detection against our approach may be hard. First, detection is basedon a fundamental property of botnet operation; structured P2P topologies are a pre-requisite for botnet robustness. Second, we exploit the attacker’s lack of knowledge ofthe precise form and structure of legitimate traffic. To be clear, we are not proposingheuristics.

2 Architecture

The need to perform efficient accounting, traffic engineering and load balancing, de-tection of malicious and disallowed activity, and other factors has led network op-erators to pursue infrastructures to monitor traffic across multiple vantage points.Internally, enterprises run intrusion detection systems to collect more fine-grained in-formation about protocols and bit patterns occurring in packets while ISPs run moni-toring infrastructures to collect information about flow-level traffic volumes.Our architecture consists of the following parts.Monitor: First, traffic monitors are responsible for observing and sampling trafficinformation from the data-plane, and building a compact representation that is usedfor analysis and detection. These monitors may run at end-hosts, or on routers withinthe network using monitoring techniques such as Cisco IOS’s NetFlow [13] or sFlow,the Openflow standard. By default, NetFlow and sFlow sample traffic by processingone out of every 200 to 500 packets. However, advances in counter architectures [28]enable efficient tracking of the entire traffic flows in ISP networks without need forsampling. For enterprises, several products under the name of Security and Informa-tion Event Management (SIEM) systems now seek to store full traffic trace informa-tion. The constant threat of attacks suffered by modern networks has led operatorsto pursue infrastructures to monitor for anomalous behavior across multiple vantagepoints.Aggregator: Second, an aggregator component periodically receives observed com-munication traces from individual monitors, and merges them together to compute anetwork-wide communication trace dataset. This dataset contains the overlay topol-ogy corresponding to all pairs of intercommunicating hosts observable across theset of monitors. It runs an algorithm that analyzes the communication traces. It thenattempts to separate the dataset into two (possibly overlapping) subsets: the botnettrace, and the non-botnet communication traces. Bots (hosts that form the botnetcommunication graph) are then output as a set of suspect hosts. This list may thenbe sent to the set of clients that are subscribers to the service. The list may be usedto install blacklists into routers, to configure intrusion detection, firewall systems,and traffic shapers; or as “hints” to human operators regarding which hosts should beinvestigated as being bots. The aggregator may optionally append a likelihood thateach suspect IP address has engaged in a certain activity, so that clients can individ-ually determine at what threshold to block traffic. Aggregators may be combined ina hierarchical fashion to further reduce control overhead. In other words, low-levelaggregators can collect information from a subset of networks and hosts, and then inturn send their results to a higher-level aggregator.Honeynet: Third, the network defendor obtains botnet communication traces froma honeynet. Such traces are available from third-party sources or by building a smallmalware testbed. The honeynet is seeded with malware relevant to the defendor. Forinstance, an oil and gas company, they might be particularly interested in targetedmalware attacks. Once the relevant malware is seeded into the malware, it is allowed

to connect to its control servers. The resulting traffic including C&C traffic is recordedand forms the seed input to the inference algorithm. IP communication headers andsummary information for each traffic flow is recorded and used by the detection al-gorithm.Inference mechanism: Fourth, traffic traces from the aggregator and the honeynetis piped to our inference technique. The algorithms underlying our technique are ableto partition the traffic into multiple botnet and non-botnet flow partitions.

3 The problem

Our goal is to separate botnet C&C communication from legitimate network traffic.Consider a communication graph G = (V,E) with V representing the set of hostsobserved in traffic traces, and an edge e∈ E representing a traffic flow. Each edge e isa k-dimensional vector where k is a system parameter. Consider one or more botnetgraphs Gb = (H,M) embedded within the communication graph, where H ⊆V is theset of bots (infected hosts) and M ⊆ HxH ⊆ E are the corresponding edges (botnetflows). The objective of the inference algorithm is to detect subgraphs Gb whilstminimising false-negative rate and false-positive rate.From a graph-theoretic perspective, this is an edge-partitioning problem. It is muchmore challenging to solve than vertex-partitioning problems solved using communitydetection algorithms and small-cut detection techniques from graph theory literature:there is no bottleneck cut separating botnet and legitimate edges.

4 Inference technique

4.1 The methodology

Botnets create unique patterns in network traffic. These patterns manifest themselvesin a number of ways which can be traced to botnet architecture. The choice of com-munication topology interconnecting bots influences resilience to bot-takedown, aswell as anonymity of messages on the botnet C&C channel. A structured peer-to-peer topology offers better anonymity of C&C message origins than a centralisedtopology. These are well known results from anonymity literature. It is well under-stood that structured (expander) topologies leads to optimal robustness against theremoval of bots. That botnets are making use of the security benefits of decentralisedcommunication architectures is not surprising and well understood within the indus-try.A second source of patterns is similarity across traffic from multiple bots. Bots tendto have similar lifecycles of reconnaisance and initial compromise, followed by theestablishment of a C&C (command and control) channel, which is inturn followed byattacks such as data-exfiltration or service denial attacks.The use of structured peer-to-peer topologies, similarity of traffic flow patterns, andcollaboration involving a large number of infected hosts, leads to the emergence ofdistinct patterns. Since the background Internet demonstrates different flow patternsand graph structure than botnets, it is possible to distinguish between botnets andlegitimate activity by using structural features of network traffic.It is therefore natural to consider detection approaches that analyse the connectiv-ity patterns (who-talks-to-whom) alongside patterns in traffic flows ( who-talks-like-whom). Researchers have so far considered both these approaches in isolation. We

consider how detection can simultaneously exploit the combined power of these ap-proaches. According to our experimental evaluation, this yeilds a technique whosedetection accuracy and false-positive rate is greater than either approach used in iso-lation.Our inference algorithm is a stochastic diffusion process over ’similar’ traffic flows.It is based on random walks on graphs. Random walks allow us to reason about graphtopology and have been heavily used in the development of graph partitioning tech-niques. To incorporate the notion of edge ’similarity’, we apply theoretical tools fromeuclidean geometry. Each traffic flow is a vector whose scalar elements specify theCartesian coordinates of a point with respect to a set of axes – one axis per element.The ensemble of points resulting from considering traffic flows constitutes a multi-dimensional geometric surface with a lot of structural information embedded withinit.Our inference algorithm constructs geometric surfaces whose structure depends onthe communication graph as well as traffic-flow information. Our inference algorithmis a stochastic diffusion process over ’similar’ edges. We start by representing traffictraces into a communication graph. We define a special random walk over this graph.The novelty of the walk is that state transition (choice of outgoing edge) dependson the incoming edge of a random walk step. This is done to incorporate the notionof edge ’similarity’ — the walk has a bias towards similar flows. Flow similarity isdefined using Euclidean distance in a high dimensional setting where each traffic flowis a vector whose scalar elements specify the Cartesian coordinates of a point withrespect to a set of axes – one axis per element. The ensemble of points resulting fromconsidering traffic flows constitutes a multi-dimensional geometric surface with a lotof structural information embedded within it.

4.2 Step1: constructing the dual graph

From captured traffic traces, we construct a communication graph G where each edgee∈ E(G) is a traffic flow represented by a k-dimensional vector and whose nodes rep-resent computers. This graph only contains topology information. We then constructa new graph that which is influenced by communication topology and the geometryof traffic-flow vectors. We achieve this by creating a dual graph.To find the dual of the communication graph, we convert edges (traffic flows) intonodes. We then connect pairs of nodes (traffic flows) whose which are locally similar:flows must transition adjacent IP addresses (share a common node in G) and demon-strate flow-similarity ( flow vectors must be less than a threshold distance apart).The intuition behind this step is that random walks on the dual graph will achievethe equivalent of entering and exiting nodes over closely related flows in the originalcommunication graph. Note that this would not be normally possible because randomwalks are memoryless. Whereas we wish to study diffusion effects of walks over sim-ilar edges rather than different edges in the original communication graph — this isone of the primary design features that will reduce the false-positive rate problem wediscussed in the motivation sub-section above.The dual of G is a weighted graph D(G). Each edge in G is a node in D(G) therefore|V (D(G))|= |E(G)|. An edge between two nodes in D(G) is constructed as follows:

– edge-adjacency: For a edge est between nodes s, t in G, the set of adjacent edgesis the set of all edges connecting s and x, or, t and y: Sest = {est ,exs,ety}.

– geometrical distance: Each edge e in G is represented by a k-dimensional vector(w1, . . . ,wk). The geometrical distance between a pair of edges ei and e j is givenby:

Wi j =

{e−

||ei−e j ||t if||ei− e j||2 ≤ ε

0 otherwise(1)

, where the norm is the Euclidean norm in R k.– Each edge e in G is a node e in the dual of G, namely D(G). We place an edge

with weight Wi j between two nodes ei and e j in D(G), if they satisfy the edge-adjacency property above and are geometrically close enough (Wi j 6= 0).

4.3 Step2: PartitioningNow that we have a graph-geometric representation (D(G)) of the traffic, our nexttask is to separate subgraphs corresponding to different traffic characteristics. Thegeometric space within which traffic points reside is represented as the graph, and weexplore the local and global properties of surfaces using random walks (as explainedearlier).

●

●●

●

●

●

●●

●

●

●

0

1

2

3

4

5

6

7

8

9

10

(a) graph G

●

●

●

●● ●

●

●

●

●

●

●●

●●●

●

●

●

●0

1

2

3

4 5

6

7

8

9

10

11

12

13

1415

16

17

18

19

(b) Dual D(G)

Fig. 1. A sample graph and its dual; vertex ids are reset in the dual

Constructing the dual allows us to partition a communication into subgraphs withsimilar traffic flow behaviour and subgraphs that have a different expansion propertiesthan the background graph they are embedded within. Consider the toy example ofedge-partitioning a graph consisting of a set of nodes connected using one set of edgeswithin a ring structure and using a second set of edges as a star structure as shown inFig. 1(a). Without taking geometry into account (flow-similarity), computing the dualgraph gives us Fig. 1(b), where star-edges have been converted into a clique subgraphthat is weakly connected to a subgraph containing nodes that were edges constitutinga ring in the original graph. Now, using random walks over the dual-graph, we canpartition the ring-edges and the star-edges using the relative expansion properties ofthese two subgraphs in the dual.Traffic data is represented as a graph with individual geometric surfaces representedas subgraph communities within a single connected component. A surface corre-sponds to a subset of V (DC (G)) that is richly intra-connected but sparsely connectedwith the rest of the graph. To partition traffic by similarity, we consider the algebraicconnectivity properties of the graph. of each surface and locate gaps between densesurfaces.

Partitioning technique To find gaps that naturally partition the data, we find theLaplacian over the graph dual. Laplacian operator is efficient at finding gaps betweengeometric clusters.The standard technique for detecting gaps using the Laplacian operator consists ofconsidering the adjacency matrix A and the graph edges, with weights w1,w2, . . . ,wnon the edges, the Laplacian matrix is defined as L = AIwAT . Here Iw is a diagonalmatrix with the weights placed along the diagonal i.e. Iii = wi = di, where di is thedegree of node i. We then find the eigenvalues of the Laplacian matrix. There arestandard techniques for computing eigenvalues, for instance the well known Lanczosalgorithm which scales as O(n log n).A partition for graph G = (V,E) is defined as a partition of V into legitimate and mal-ware subgraphs L, and M, such that the number of edges across the gap gap(L,M)/(|L||M|)is minimised. In such a scenario, the second smallest eigenvalue (λ2) of L, yields alower bound on the optimal cost of the gap is (1−λ2). The eigenvector (v2) corre-sponding to the second eigenvalue, bisects the graph into only two clusters based onthe sign of the corresponding vector entry. Division into a larger number of clustersis achieved by repeated bisection. To prevent repetitive bisection from using trivialgaps we use the well known conductance metric.

Quality metric The quality of a partition is measured by its conductance, the ratioof the number of its external connections to the number of its total connections.We let d(i) denote the degree of vertex i. For S ⊂ V (DC (G)) , we define δ(S) =∑i∈S d(i) as the volume of S. So,

δ(V (DC (G))) = 2|E(DC (G))|

. Let EE(S,V (DC (G))\S) be the set of edges connecting a vertex in S with a vertexin V (DC (G))\S). We define the conductance of a set of vertices S, written φ(S) as

phi(S) =EE(S,V (DC (G))\S)

min(δ(S),δ(V (DC (G))\S))

The conductance of DC (G) is then given by:

minS∈V (DC (G))

φ(S)

. To avoid obtaining trivial partitions, the conductance of a subgraph is normalised bythe size of the partition.

Generalising the approach The Laplacian operator is applicable for linear con-texts. However, since the botnet context is adversarial the attacker can architect bottraffic to behave in a non-linear manner. For instance, the attacker can engineer bottraffic to follow a non-linear geometric shape such as a sphere or a curved line. Insuch a case, a linear operator would make errorneous judgements. Since the data isadversary controlled the assumption of linearity would be incorrect. Thus, we use thelaplace-beltrami operator which will allow correct distance measurements even whenthe geometrical subspace is non-linear.Let X denotes the set of functions defined on the vertices of D(G). The applica-tion of laplace-beltrami operator on a function results in another function in X , i.eL f ∈ X∀ f ∈ X since L is a function of functions: from X to X . An eigenfunctionof the laplace-beltrami operator is a function f such that L f = λ f where λ is the

eigenvalue of L and represents the scaling of f . By computing the eigenfunctions weobtain a compressed list of features that is expressed as a linear combination of the(larger number of) input features. This approach of using eigenfunctions for featurereduction prior to inference is well known within the machine learning community.We leverage this as a building block in our algorithm.

– To apply the laplace-beltrami operator, we compute L = D−W , where D is thediagonal matrix corresponding to W . L is a symmetric and positive matrix.

– Compute the eigenvectors and eigenvalues of L f = λ f ordered in the increasingorder of the eigenvalues λ0 ≤ λ1 ≤ λ2 . . .λk−1.

– Ignore the first eigenvector f0 = (1, . . . ,1) with eigenvalue λ0 = 0.– Each vertex i of D(G) is now expressed in terms of the m new features. The

modified graph is referred to as DC (G). Vertices of the modified graph are nowavailable as V (D(G))i = ( f1(i), . . . , fm(i)).

– Each edge between a pair of vertices in DC (G) is refreshed with an edge weightthat corresponds to the new set of features. Each edge between vertices i and jis refreshed as follows:

W ′i j =

{e−

||vi−v j ||t if||vi− v j||2 ≤ ε

0 otherwise(2)

, where the norm is the Euclidean norm in R m.

5 Noise toleranceOur detection method leverages the attackers limited knowledge about the locationand structure of the legitimate surface to bound statistical noise injected by an adap-tive botmaster. All points, legitimate traffic, noise traffic, and botnet traffic, are part ofthe dual graph. As described earlier, each traffic point constitutes a vertex. An undi-rected link exists between two vertices if they are within a ε threshold distance ofeach other. Notice that here the link indicates proximity.A link may exist between a noise point and a legitimate point if the points are withinclose proximity. Such a noise link allows graph diffusion between the legitimate andbotnet surfaces. Each noise link connects a noise pack to vertices within a legitimatetraffic surface.To successfully evade detection, the attacker must succeed in placing a large numberof noise points (noise pack) in close proximity of another surface, all points within thenoise pack must be in close proximity of legitimate points rather than just a few. Thisrequires knowledge of the location of a majority of the points in that surface. Ourdesign leverages three important facts to limit the number and size of noise packs:a) legitimate surface graphs tend to have high algebriac connectivity (the smallestpositive eigenvalue of its Laplacian matrix). b) large amounts of noise (compared tothe number of noise links) decreases algebriac connectivity.

Capping the number of noise links: The evasion resistance property of our detec-tion algorithm relies on limiting the number of noise links (m). There are primarilytwo scenarios in which botnet malware might attempt to increase m:

– No knowledge of legitimate subspace: Malware sprays noise randomly in thedata space in the hope that some of these points will be near the legitimate sur-face. This is quite difficult to perform because the data space is large (theoreti-cally infinite) whereas the legitimate surface is located on a much much smallersubspace. Besides the introduction of such noise is easy to detect and removewith a well designed filter (such as a Gaussian filter).

– Partial knowledge of legitimate subspace: A second possibility is that malwarehas some awareness of legitimate surface points. For instance, this informationmay be gained by analysing legitimate traffic on the infected machine. It canadd noise in the proximity of known legitimate points to create links betweenthe botnet surface and the legitimate surface. However, in doing so, the algebriacconnectivity characteristics are disturbed by the small set of noise links. Thenoise based on partial knowledge introduces a gap between the legitimate andbotnet surface. This is true, unless the noise surface is in the proximity of asignificant majority of points within the legitimate surface. This is very hard toaccomplish unless the attacker has full knowledge of the legitimate traffic; high-level trends or summary information do not give information regarding structureand location of the legitimate surface, full traffic traces would be required.

6 Results

We evaluated our algorithm in two different experimental settings. We apply our algo-rithm on real botnet traces within an enterprise setting and measure its effectiveness.

Malware testbed: In order to obtain botnet traffic flows we created a testbed of25 servers within a test network connected to the Internet. The computers was thenseeded with samples from three peer-to-peer versions from well known botnet fam-ilies: Zeus, Miner, and Spyeye. All three botnets have moved from centralised C&Cservers to entirely P2P communication. Zeus and Spyeye are designed for stealingbanking information while Miner steals Bitcoin credentials.On each testbed computer, we instantiated 70 copies of a malware sample (at a time)within a hypervisor, i.e 1750 instances from each family, or a total of 5250 malwareinstances. The testbed allows the bot to connect with other bots in the wild whichenables us to closely observe the actions of the bot and its interactions with otherbots. The result is a lot of network traffic which will be attack traffic by definition. Weexercised due diligence to prevent our testbed from being used as an attack platform.In particular, all probing, scanning, spam was blocked while DDOS attacks wererate-limited at the very least. However, command and control (C&C) incoming andoutgoing traffic was allowed as this was essential for our study.We set up traffic monitors on the backbone router at a university campus networkusing an electrical signal duplicator unit that works at up to 20Gbps. As opposedto port-mirroring, this approach allows us to capture packets at the line rate withoutinducing the effects of packet sampling. The traffic rate is typically 2.5–8Gbps.We developed an efficient network flow capture tool that processes packet traces andgenerates flows. A flow is record of communication between a pair of hosts and isrepresented by a tuple containing a number of fields as given below. We considerUDP and TCP traffic. In the case of UDP traffic, the TCP fields listed are zeroed outas are optional TCP fields.Each flow record is a tuple structure containing two parts; inter-flow values that arecommon to all flow records occurring within a ten-minute time period, and flow spe-cific fields. The entire set of fields comprising a flow record are given below:

– Inter-flow statistics (fields) computed across the flows: traffic volume, duration,distribution of packets per flow, distribution of flows per period, distribution ofpackets per flow, throughput distribution, distribution of inter-flow arrival timesaveraged distribution of inter-packet arrival times. Distributions are computedover a time interval of ten minutes.

– Flow fields: tcp/udp.source-port, tcp/udp.destination port, IP version, IP headerlength, ip.tos — precedence, ip.tos — delay, ip.tos — throughput, ip.tos — relia-bility, ip.tos — reserved, ip.tos — total length, ip.flags,ip.fragmentoffset, ip.ttl , ip.protocol, Entropy of ip.id# distribution, Entropy oftcp.seq# distribution, Entropy of tcp.ack# distribution, tcp.offset, tcp.reserved,tcp.flags,tcp.maximum-segment-size, tcp.echotimedata

In the above list, each sample distribution is represented by the corresponding his-togram. The first bin corresponds to P(X < x) ≤ 5%, the second bin corresponds toP(X < x)≤ 15% and so on.

Algorithm application We now consider the application of our algorithm on areal-world dataset.To access live traffic, we captured network traffic at a universitygateway for a period of one month between March and April 2012. This datasethas 113,576 unique source IP addresses and 11,643,993 traffic flows. This includes432,257 embedded botnet flows from seeded malware. This corresponds to a com-munication graph GE containing both malware and non-malware edges.The first step of our algorithm is to create the dual of GE , namely D(GE). At this stageeach edge (flow) becomes a node and nodes become edges, therefore flow-vectorsare now associated with each node. An edge is constructed between two nodes ifthe Euclidean distance between the corresponding flow-vectors is less than a certainthreshold ε. We used ε = .0025. This value controls the runtime of the generation ofthe dual. Our inference algorithm is not sensitive to high values of ε (leading to adenser graph), since the diffusion effects of the subsequent random walk process is

controlled by the non-linear kernel function: e−||ei−e j ||

t see Eqn. 1. We chose t = 1, buthigher values will produce a sharper decay. Our choice of ε leads to O(logE(D(GE)))edges per node in the dual. This step leads to the to embedding of information fromthe communication graph topology and the geometry of network traffic, within thedual graph. Figure 4(a) shows a rendering of the dual with vertices represented byblue points and edges represented by the distance between vertices; the number ofedges is too high to be represented graphically. Figures 4(b) through 4(f) show thedual graphs corresponding to the other five weeks of enterprise traffic with embeddedbotnet traffic.

(a) (b) (c) (d)

Fig. 2. Visual representation (top three dimensions) of network traffic after dimensionality reduction,at the end of Step 2 of our algorithm.

The second step of our algorithm is dimensionality reduction. This step prevents thebotnet from altering traffic patterns over time in order to “throw-off” the detectionsystem. Thus the compressed feature set selected by the algorithm can vary from

across time. Feature selection is carried out in an unsupervised manner. The com-pressed feature list is given by the ordered eigenvalues of the laplacian of the dualgraph computed at the end of the previous step. The first eigenvalue is zero by defini-tion and this is ignored. The eigenvectors corresponding to the eigenvalues representthe new mapping of traffic data points as a function of the compressed feature list.The embedding of the various geometrical surfaces within this compressed space isshown in figure 3.

(a) (b) (c) (d)

Fig. 3. Visual representation of the results of our inference algorithm showing isolated botnet traffic.

Thus far, our algorithm has only partitioned traffic into different surfaces. The botnetflows are partitioned into respective surfaces. After three iterations of the partitioningalgorithm, we obtain a subgraph (surface) of size 432,256 nodes, containing 423,906nodes corresponding to botnet flows, and 8,350 nodes corresponding to non-botflows.At this stage, our validation metric indicates that the sub-graph has a graph conduc-tance of about 0.9 (In all other scenarios, the graph conductance is less than 0.5, sowe can safely set our threshold of the graph conductance test to be 0.5).

#Malicious flows #gateway-flows Detection% % FPWeek1 3368 1211736 99.98 0.019Week2 8836 1392755 99.93 0.037Week3 3231 1109264 97.95 0.082Week4 8349 1312952 98.09 0.041Week5 8217 1130120 98.21 0.030

Table 1. Zeus in enterprise traffic – detection and error rates of inference


Table 2. Spyeye in enterprise traffic – detection and error rates of inference

We evaluated the performance of detection on a weekly basis through ourdataset. Each week, we collected gateway traces and combined it with thatweek’s botnet traces. The previous week’s data was discarded from the graphand dual generation. The honeynet seed traces were also fresh, thus corre-sponding traces from each week were input to our detection algorithm.


Table 3. Miner in enterprise traffic — detection and error rates of inference

To evaluate performance, we are concerned with the false positive rate (thefraction of non-bot nodes that are detected as bots) and the false negativerate (the fraction of bot nodes that are not detected). The results of botnetdetection for Zeus, Spyeye, and Miner are shown in Table 1, Table 2, andTable 3 respectively.Detection rates ranged between 97% and 99% for Zeus and Spyeye. ForMiner, the detection rate was a bit lower at around 95% on the average. Im-portantly, for all three peer-to-peer botnets, the false-positive rate was wellbelow 0.1%.The high detection and low false-positive rates are better than state-of-the-artalgorithms. It shows that the combination of traffic-flow features and graph-structure information holds good potential in designing reliable algorithmsfor botnet detection.While our approach is not perfectly accurate, we envision it may be of usewhen coupled with other detection strategies (e.g., previous work on botnetdetection [14, 12], or if used to signal “hints” to network operators regardingwhich hosts may be infected.

7 Discussion

As we have demonstrated, starting with a certain definition of botnet be-haviour — traffic produced from the malware testbed, the analysis of en-terprise and core Internet traffic can be effective at identifying botnets. Wenow discuss the significance of our results and related insights.Our inference technique is a Markov diffusion process over a weighted graph.Our motivation for the use of a diffusion process stems from the insight thatbotnet traffic has specific geometry. This geometry is intrinsic meaning thatit remains invariant over time, yeilding a stable shape. The diffusion processis used to define a metric over the shape, which allows us to reason about agiven traffic dataset. We project traffic datapoints into a graph and identifybotnet shapes from it. These shapes are resistant to noise induced by day-to-day changes in botnet activity as well as traffic from other applications.The graph underpinning the diffusion process is constructed using both com-munication topology information and communication flow information. SinceP2P topologies are a fundamental design requirement of the botnet, it is hardfor the attacker to make changes in order to evade detection. Without P2Ptopologies underlying botnet communication, the C2 channel would not berobust enough to withstand take down attempts. Thus the attacker ends uphaving to make a choice between robustness or stealth, they cannot haveboth properties even though botnets need both properties.

Dynamic feature selection: One of the central aspects of our inferencealgorithm is its capability of dynamic feature detection. Instead of using astatic heuristic-driven definition of which features are indicative of botnettraffic, we extract this from the dataset using the output traffic from our mal-ware testbed. From our experiments, it seems that the algorithm works ratherwell. However we do not offer an argument in support of a firm bindingbetween the fundamental properties of a botnet and our feature selection al-gorithm. Therefore, we believe that while dynamic feature selection is morerobust to botnet evolution than static heuristics, it is important to understandthat there is no ‘intelligence’ in our feature selection algorithm — it simplyselects those features that capture most of the information contained in thedataset. Hence it is essential that a real-world deployment of this algorithmmust allow for operator assisted feature selection, for instance by vetting theoutput of the automated selection algorithm.

8 Related Work

Bots are unique amongst networked malware in that they collectively main-tain communication structures across nodes to resiliently distribute com-mands from a command and control node. The ability to coordinate andupload new commands to bots gives the botnet owner vast power when per-forming criminal activities, including the ability to orchestrate surveillanceattacks, perform DDoS extortion, sending spam for pay, and phishing. Thisproblem has worsened to a point where modern botnets control hundreds ofthousands of hosts and generate revenue of millions of dollars per year fortheir owners [7].Encryption can make the detection of malware traffic much more difficult,especially if the system uses widespread protocols such as HTTP. One ap-proach is then to attempt to decrypt all packets and then perform signaturedetection on the decrypted contents, as is done by Rossow et al. [24]. Theytake advantage of the fact that in many cases the encryption used is very sim-ple, and often the key for encryption is hardcoded into the malware binary.They keys are fetched by reverse engineering, and then the payloads can bedecrypted, and signature-based detection applied. The obvious downside tothis method is that it requires the labour intensive reverse engineering step.Further to this, Rafique et al. [22] proposed a system for large-scale auto-matic signature generation. The system uses network traces collected fromsandboxes and produces signatures for groups of similar malware, coveringnumerous protocols. This system is able to identify numerous malware ex-ample with a high rate, and experiences a low false positive rate due to thespecificity of the signatures generated. The signatures are designed to be ex-ported to intrusion detection systems such as Snort for on-line detection.Server Detection: Nelms et al. [17] propose ExecScent, a system for iden-tifying malicious domains within network traffic. The system works by cre-ating network traces from known malware samples to create signatures, thatcan then be compared with network traffic. The signatures are not just based

upon the domain names, but also the full HTTP requests associated withthem. How this system is unique, however, is that the signatures are tailoredto the network that they will be used on based upon the background networktraffic. This step is extremely useful at reducing the level of false positivesby exploiting the fact that different networks will exhibit different brows-ing behaviour (for example a car manufacturer is unlikely to visit the samewebsites as a hospital).

8.1 Non-Signature Based Methods

We now describe related work in non-signature based detection methods.None of the techniques we discuss in this section, have any evasion resistanceproperties due to fundamental properties of the detection mechanism.BotMiner [9] detects infected hosts without previous knowledge of botnets.In this system, bots are identified by clustering hosts that exhibit similar com-munication and (possible) malicious activities. The clustering allows hosts tobe groups according to the botnet that they belong to as hosts within the samebotnet will have similar communication patterns, and will usually performthe same activities at the same time (such as a DDoS attack).There are also schemes that combine network- and host-based approaches.The work of Stinson et al. [25] attempts to discriminate between locally-initiated versus remotely-initiated actions by tracking data arriving over thenetwork being used as system call arguments using taint tracking methods.Following a similar approach, Gummadi et al. [10] whitelist application traf-fic by identifying and attesting human-generated traffic from a host whichallows an application server to selectively respond to service requests. Fi-nally, John et al. [15] present a technique to defend against spam botnets byautomating the generation of spam feeds by directing an incoming spam feedinto a Honeynet, then downloading bots spreading through those messagesand then using the outbound spam generated to create a better feed.

Server Detection: DNS There has been a large amount of work that attemptsto provide a detection mechanism that can identify domains associated withmalware at the DNS level. DNS is used by a large amount of malware thatmakes use of a centralised command and control structure.One proposed detection method is to make use of the reputation of domainnames to decide if they are related to malicious activities [1]. In this sys-tem (Notos), domains are clustered in two ways. First, they are clusteredaccording to the IP addresses associated with them. Secondly, they are clus-tered according to similarities in the syntactic structure of the domain namesthemselves. These clusters are then classified as malicious or not based upona collection of whitelists and blacklists: domains in a cluster that containsblacklist domains are likely to be malicious themselves. This system is runon local DNS servers and can achieve a true positive rate of 96% and anlow false positive rate. In a further piece of work from the same authors asNotos, the idea is vastly expanded to use the global view of the upper DNS

hierarchy. In this new system (Kopis) [2], a classifier is built that, instead oflooking at the domains’ IP and name, looks at the hosts that make the DNSrequests. They leverage the fact that malware-related domains are likely tohave an inconsistent, varied pool of requesting hosts, compared to a legit-imate domain which will be much more consistent. They also look at thelocations of the requesters: requesters inside large networks are given higherweighting as a large network is more likely to contain infected machines.When tested, this system was actually able to identify a new botnet based inChina, which was later removed from the internet.DNS is also used in another way by malware controllers that we have notyet mentioned. One feature of DNS is DNS blacklists (DNSBL). These areused by spam filters to block emails from known malicious IPS. The mal-ware controllers will often query these blacklists for IPs under their controlto test their own networks [21]. The behaviour of a botmaster performingDNS lookups for his own hosts will differ from legitimate use of DNSBLs.For example, a malicious host performing DNS lookups on behalf of thecontroller will perform lots of queries, but will not be queried itself, while alegitimate service will receive incoming queries. This behaviour is relativelyeasy to detect by simply looking for queries that exhibit this behaviour.Paxson et al [18] attempt to provide a detection mechanism that leveragesthe amount of information transmitted over a DNS channel in order to detectsuspicious flows. The system allows for a upper bound to be set, any DNSflow that exceeds this barrier is flagged for inspection. The upper bound canbe circumvented by limiting flows, but this has an impact the amount of dataexfiltration/command issuing that can occur. The system looks primarily atdata included within domain names, but also looks at inter-query timings andDNS packet field values, both of which can provide low capacity channels.Perdisci et al [19] apply clustering to domains so they are grouped accord-ing to overlap in the returned IP addresses. By then comparing the clustersto previously labelled data, they can then be classified as flux or non-flux,revealing domains that make use of the same network.

8.2 Graph-based approaches

Several works [4, 12, 11, 29, 14] have previously applied graph analysis todetect botnets. The technique of Collins and Reiter [4] detects anomalies in-duced in a graph of protocol specific flows by a botnet control traffic. Theysuggest that a botnet can be detected based on the observation that an at-tacker will increase the number of connected graph components due to asudden growth of edges between unlikely neighbouring nodes. While it de-pends on being able to accurately model valid network growth, this is a pow-erful approach because it avoids depending on protocol semantics or packetstatistics. However this work only makes minimal use of spatial relationshipinformation. Additionally, the need for historical record keeping makes itchallenging in scenarios where the victim network is already infected whenit seeks help and hasn’t stored past traffic data, while our scheme can be usedto detect pre-existing botnets as well. Illiofotou et al. [12, 11] also exploit

dynamicity of traffic graphs to classify network flows in order to detect P2Pnetworks. It uses static (spatial) and dynamic (temporal) metrics centered onnode and edge level metrics in addition to the largest-connected-component-size as a graph level metric.More recently, Botgrep [16] presented a scheme that searches for expandergraphs to discover P2P graphs within ISP traffic. The theoretical componentof the algorithm presented is our work is much more stronger. Botgrep doesnot consider traffic flow categorisation and therefore would end up with highfalse-positive rates when its core assumption is broken — high-degree nodesshould not be infected and have incoming or outgoing botnet traffic flows.In the operational context of a NAT (Network Address Translator), the traf-fic of hundreds of computers would be aggregated into a single IP address.Such NAT installations are getting rather popular: mobile broadband ISPsuse carrier-NATs where thousands of mobile consumers are behind a NATrun by the ISP, and each user is on a separate port. Our inference algorithm,will be able to operate in such deployment contexts very well since it com-bines flow clustering with structured graph analysis; even if graph structureis obscured by the NAT the inference algorithm can still leverage non-linearsubspace analysis over traffic flow data to isolate botnet traffic.Further, as compared with other graph-based and behaviour-analysis schemes,we have shown (see Fig. 3) that there is more to application traffic than mereclustering: there are intricate geometrical surfaces corresponding to applica-tion traffic characteristics. Indeed our algorithm is quite generic and we hopethat our results will encourage other researchers to apply our technique toother traffic classification problems.

9 Conclusion

The ability to localise bot-infected hosts at Internet scales represents both avery challenging problem, and one with a very high reward. In this work, wehave approached the problem of botnet detection with a security-by-designapproach: detection evasion is based on the attacker’s detailed knowledge oflegitimate traffic traces. Thus detection is based on the fundamental proper-ties of defense. We are not proposing a new heuristic technique.Our approach works by leveraging patterns within the communication graphas well as within network traffic between Internet hosts from a set of traffic-monitoring vantage points, and then exploiting the intrinsic non-linear ge-ometry of traffic in order to distinguish traffic flows that are part of the bot-net. Behavioural analysis approaches (involving machine learning) are com-monly criticised in the security community for assuming a static traffic pro-file of the botnet in the form of a feature list. As a first step towards beingable to operate in an environment where the botnet evolves in response to thedetection mechanism, we adopt the notion of dynamic feature selection.Compared with results from previous work using graph-theoretic or behaviouralanalysis approaches, our techniques accomplish better results. This is not sur-prising since they exploit intuitions from both. However, our techniques do

not achieve perfect accuracy, but they achieve a low enough false positiverate to be of substantial use, especially when combined with other comple-mentary techniques. Finally, we do not attempt to address the challengingproblem of botnet response. Future work may leverage our inferred botnettopologies by dropping crucial links to partition the botnet, based on thestructure of the botnet graph.

References

1. M. Antonakakis, R. Perdisci, D. Dagon, W. Lee, and N. Feamster. Building adynamic reputation system for dns. In Proc. of the USENIX Security Symposium,2010.

2. M. Antonakakis, R. Perdisci, W. Lee, N. Vasiloglou, and D. Dagon. DetectingMalware Domains at the Upper DNS Hierarchy. In Proc. of the USENIX SecuritySymposium, 2011.

3. P. Barford and V. Yegneswaran. An Inside Look at Botnets, volume 27 of Ad-vanced in Information Security, chapter 8, pages 171–192. Springer, 2006.

4. M. P. Collins and M. K. Reiter. Hit-list worm detection and bot identification inlarge networks using protocol graphs. In C. Krugel, R. Lippmann, and A. Clark,editors, RAID, volume 4637 of Lecture Notes in Computer Science, pages 276–295. Springer, 2007.

5. M. P. Collins, T. J. Shimeall, S. Faber, J. Janies, R. Weaver, M. De Shon, andJ. Kadane. Using uncleanliness to predict future botnet addresses. In IMC ’07:Proceedings of the 7th ACM SIGCOMM conference on Internet measurement,pages 93–104, New York, NY, USA, 2007. ACM.

6. E. Cooke and F. Jahanian. The zombie roundup: Understanding, detecting, anddisrupting botnets. In Steps to Reducing Unwanted Traffic on the Internet Work-shop, 2005.

7. J. Franklin, V. Paxson, A. Perrig, and S. Savage. An inquiry into the nature andcauses of the wealth of internet miscreants. In ACM conference on Computer andcommunications security, pages 375–388, New York, NY, USA, 2007. ACM.

8. F. C. Freiling, T. Hoz, and G. Wichereski. Botnet tracking: Exploring a root-cause methodology to prevent distributed denial-of-service attacks. In EuropeanSymposium on Research in Computer Security, 2005.

9. G. Gu, R. Perdisci, J. Zhang, and W. Lee. BotMiner: Clustering Analysis ofNetwork Traffic for Protocol- and Structure-Independent Botnet Detection. InProc. of the USENIX Security Symposium, 2008.

10. R. Gummadi, H. Balakrishnan, P. Maniatis, and S. Ratnasamy. Not-a-Bot (NAB):Improving Service Availability in the Face of Botnet Attacks. In NSDI 2009,Boston, MA, April 2009.

11. M. Iliofotou, M. Faloutsos, and M. Mitzenmacher. Exploiting dynamicity ingraph-based traffic analysis: Techniques and applications. In ACM CoNext, 2009.

12. M. Iliofotou, P. Pappu, M. Faloutsos, M. Mitzenmacher, G. Varghese, andH. Kim. Graption: Automated detection of P2P applications using traffic disper-sion graphs (TDGs). In UC Riverside Technical Report, CS-2008-06080, 2008.

13. C. S. Inc. Cisco IOS Netflow. http://www.cisco.com/en/US/products/ps6601/products_ios_protocol_group_home.html.

14. M. Jelasity and V. Bilicki. Towards automated detection of peer-to-peer botnets:On the limits of local approaches. In USENIX Workshop on Large-Scale Exploitsand Emergent Threats (LEET), 2009.

15. J. P. John, A. Moshchuk, S. D. Gribble, and A. Krishnamurthy. Studying spam-ming botnets using botlab. In NSDI’09: Proceedings of the 6th USENIX sympo-sium on Networked systems design and implementation, pages 291–306, Berke-ley, CA, USA, 2009. USENIX Association.

16. S. Nagaraja, P. Mittal, C.-Y. Hong, M. Caesar, and N. Borisov. BotGrep: FindingP2P bots with structured graph analysis. In USENIX Security Symposium, pages95–110, 2010.

17. T. Nelms, R. Perdisci, and M. Ahamad. ExecScent: Mining for New C&C Do-mains in Live Networks with Adaptive Control Protocol Templates. In Proc. ofthe USENIX Security Symposium, 2013.

18. V. Paxson, M. Christodorescu, M. Javed, J. Rao, R. Sailer, D. Schales, M. P.Stoecklin, K. Thomas, W. Venema, and N. Weaver. Practical comprehensivebounds on surreptitious communication over dns. In Proceedings of the 22NdUSENIX Conference on Security, 2013.

19. R. Perdisci, W. Lee, and N. Feamster. Behavioral Clustering of HTTP-BasedMalware and Signature Generation Using Malicious Network Traces. In Proc.of the USENIX Symposium on Networked Systems Design & Implementation,2010.

20. M. Rajab, J. Zarfoss, F. Monrose, and A. Terzis. A multifaceted approach tounderstanding the botnet phenomenon. In Internet Measurement Conference,2006.

21. A. Ramachandran, N. Feamster, and D. Dagon. Revealing Botnet MembershipUsing DNSBL Counter-Intelligence. In Proc. of the USENIX Workshop on Stepsto Reducing Unwanted Traffic on the Internet (SRUTI), 2006.

22. M. Z. Raque and J. Caballero. FIRMA: Malware Clustering and Network Signa-ture Generation with Mixed Network Behaviors. In Proc. of the Symposium onRecent Advances in Intrusion Detection (RAID), 2013.

23. K. Rieck, G. Schwenk, T. Limmer, T. Holz, and P. Laskov. Botzilla: Detectingthe “Phoning Home” of Malicious Software. In Proc. of the ACM Symposium onApplied Computing (SAC), 2010.

24. C. Rossow and C. J. Dietrich. ProVex: Detecting Botnets with Encrypted Com-mand and Control Channels. In Proc. of the Conference on Detection of Intru-sions and Malware & Vulnerability Assessment (DIMVA), 2013.

25. E. Stinson and J. C. Mitchell. Characterizing bots’ remote control behavior.In W. Lee, C. Wang, and D. Dagon, editors, Botnet Detection, volume 36 ofAdvances in Information Security, pages 45–64. Springer, 2008.

26. G. Stringhini, M. Egele, A. Zarras, T. Holz, C. Kruegel, , and G. Vigna. B@bel:Leveraging Email Delivery for Spam Mitigation. In Proc. of the USENIX Secu-rity Symposium, 2012.

27. G. Stringhini, T. Holz, B. Stone-Gross, C. Kruegel, , and G. Vigna. BotMagnifier:Locating Spambots on the Internet. In Proc. of the USENIX Security Symposium,2011.

28. Q. Zhao, J. Xu, and Z. Liu. Design of a novel statistics counter architecture withoptimal space and time efficiency. In ACM SIGMETRICS, June 2006.

29. Y. Zhao, Y. Xie, F. Yu, Q. Ke, Y. Yu, Y. Chen, and E. Gillum. Botgraph: Largescale spamming botnet detection. In NSDI, 2009.

A Appendix1

In the following figure, we show a two-dimensional visual of botnet traffic. The longlines are a result of DNS fast-flux.

(a) (b) (c)

(d) (e) (f)

Fig. 4. Two dimensional representation of network traffic with embedded Zeus traffic, before featureselection. This figure shows the dual-graph D(G) of the traffic dataset.

botyacc: uniﬁed p2p botnet detection using behavioural...

Documents