jaal: towards network intrusion detection at isp...

Jaal: Towards Network Intrusion Detection at ISP ScaleA. Aqil, K. Khalil, A. Atya, E. Paplexakis, S. Krishnamurthy, KK. Ramakrishnan

T. Jaeger

P. Yu, A. Swami

1

University of California Riverside

Penn State University

US Army Research Lab

Is IDS Needed at ISP Scale?Increasing number of network attacks

Distributed, span entire WANsUnnoticed until too late

Mirai botnetMirai exploits vulnerable devices spread across the internet to launch DDoSSep 2016: Krebs on Security (620 Gbit/s), OVH (1Tbit/s)Oct 2016: multiple attacks on Dyn, affected Twitter, Github, Airbnb, Netflix, othersNov 2016: Liberia’s internet infrastructure

2

Is IDS Needed at ISP Scale?Simple two step attack: scan then flood

Hardcoded default passwords control vulnerable devices (scanning a large set of IP addresses)Compromised devices also repeat the scanLaunch coordinated attack on targets at the bot master signal

Inherently difficult to detectScanning activity observable only at ISP level“DDoS prevention works best deep in the network, where the pipes are the largest and the capability to identify and block the attacks is the most evident”

Bruce Schneier, security expert3

ISP Scale IDS is ChallengingState of the art NIDS (e.g., Snort, Bro) are effective

But expect to inspect all packetsWorks only at enterprise scale

Problematic at ISP scale: Multiple ingress/egress pointsTo create global view required for analysis, information collected from multiple vantage points needs to be aggregated

Challenge: how to aggregate? 4

Aggregation Approach ICopy and forward to central engine

Simple, but lead to performance degradation

5

Jaal CoNEXT ’17, December 12–15, 2017, Incheon, Korea

SYN FloodDistributed SYN FloodSockstressPort Scan (nmap)Distributed SSH Brute Force

TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 1000, r = 12

(a) k = 100


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n =1000, r = 12

(b) k = 200


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 1000, r = 12

(c) k = 400

Figure 4: ROC curves for various attacks. Batch size = 1000, rank = 12, varying k, Trace 1.


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 2000, k = 500

(a) r = 10


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 2000, k = 500

(b) r = 12


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 2000, k = 500

(c) r = 15

Figure 5: ROC curves for various attacks. Batch size = 2000, k = 500, varying rank, Trace 1.

Communication overheadTrue positive rate

Perc

enta

ge

40

60

80

100

FPR10 20 30 40 50 60 70 80 90 100

Increase in TPR and communication overhead

Figure 6: The increase in TPR and commu-nication overhead as the acceptable FPR isincreased.

Avg decrease in throughputWorst decrease in throughput Drop in accuracy

Perc

enta

ge d

ecre

ase

0

50

100

Percentage of traffic replicated0 10 20 30 40 50 60 70 80 90 100

Performance degradation as traffic replication increases

Figure 7: Performance degradation as thepercentage of tra�c that is replicated in-creases.

Unchecked infectionsRemaining infected devices after Jaal

Num

ber o

f inf

ecte

d de

vice

s

0

50

100

150

Time (s)0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

Number of Infected devices vs time

Figure 8: Mirai Attack: Unchecked infec-tions versus infected devices shut o� upondetected by Jaal.

GreedyRobin HoodRandom

Load

Rat

io (m

in/m

ax)

0

0.2

0.4

0.6

0.8

1.0

Groups0 2 4 6 8 10 12 14

Load for different monitoring groups - Topology 1

Figure 9: The load across di�erentmonitorgroups for topology 1. Figure 10: Themagnitude of the sin-

gular values for a packet matrix ofn = 1000.

Figure 11: Percentage savings vs.the batch size for �xed error rates.

use the popular Nmap tool [16] to conduct scans. Nmap has a listof ports that it scans and we simply use those defaults.

Testbed and con�guration:We implement Jaal on an in houseSDN testbed that consists of 5 Dell PowerEdge 730 Servers (22 cores,256GB memory, 12 ports) and two HP 3800 series switches (SDNenabled each with 48 ports) and 2 Arista 7280E switches (each with72 ports). Note that we use SDN for the purposes of evaluating Jaal;it is not a necessary platform for the deployment of Jaal.

To represent the network, we use two realistic RocketFuel topolo-gies [55] in our evaluations viz., Abovenet (which we call topology

1) and Exodus (which we call topology 2). Topology 1 has 367routers and topology 2 has 338 routers. We create the topologiesby instantiating the desired number of open vSwitch instances andconnecting them using virtual links to match the con�guration ofthe chosen topology. Our virtual topology can handle Gigabit tra�ce�ectively, where the underlying SDN switches have 10GBE ports.

8.1 Detection Accuracy and OverheadOverall results: In a nutshell, our experiments show that for all theattacks considered, Jaal achieves an average true positive rate (TPR)

70% Tput loss

Aggregation Approach IISample and forward to central engine

Already used by ISPs for heavy-hitter detectionEfficient but achieves poor detection accuracy for general attacks

6

Attack Reservoir SamplingDistributed SYN Flood 54%

Sock Stress 60%

SSH Brute Force 42%

Aggregation Approach IIICreate sketches and forward to central engine

Targeted measurement approachStrong resource/accuracy guarantees

Lacks generality: need one sketch for every measurement task

For TCP/IP header, need 218 different sketches to capture all possible measurements

7

Jaal Design Goals

Design an ISP-scale NIDS that:

Can detect wide array of attacks requiring global view, using signatures similar to Snort’s

Focus on TCP/IP header-based attacks

Does not require copying and forwarding raw packets (minimizes bandwidth overhead)

8

Jaal Overview

9


We point out here that it is well known that modern open sourceDPI-based IDSs cannot cope well with high tra�c volumes [32, 42].In fact, they do not fail gracefully and with tra�c rates greater than20 Gbps, packet losses of over 50% are experienced regularly; thiscan seriously a�ect the accuracy in detecting attacks (can resultin approximately a 50% loss in detection accuracy). The only wayaround the loss in accuracy is to provision IDS clusters for peakload which can result very high costs and even with that, a largewastage of computation resources at o�-peak times.

Current methods such as sampling/sketching are inade-quate. One may argue that the heavy workloads due to copyingraw packets, can be overcome by using state of the art packet sam-pling techniques [30]. In fact, ISP’s typically employ rudimentarysampling techniques like NetFlow [37] to obtain a coarse view ofnetwork dynamics. However, while sampling can help in heavy-hitter detection, it results in poor accuracy with respect to �ne-grained features needed for detecting a wide variety of attacks[46, 49], especially in cases where a information with respect toa large number of successive packets is required (e.g., DDoS). Inparticular, sampling fails to capture correlations across packets.

An alternative technique to sampling is sketching [43]. Whilesketching provides strong resource/accuracy guarantees, it is in-herently a targeted measurement scheme in which one needs toconstruct a sketch for every measurement task (e.g., counting thenumber of unique IP addresses or measuring the entropy of IP ad-dresses in a batch of packets). Unlike the summarization techniqueswe develop in Jaal, modern sketching techniques are also restrictedto single dimensional measurements [44]. This lack of generalityimplies that sketching does not scale. The sketching technique in[44], which is arguably the most general sketch possible today, al-lows a single sketch to be used for multiple measurement tasks.However, it is still limited to a single dimension.

More concretely, consider the source IP address as the dimensionof interest. By using the technique in [44], one can create a sketchthat captures the number of unique addresses seen. This sketch canbe used to detect heavy hitters, membership testing and entropy es-timation [44]. However, even this sketch will only be able to answerqueries about the source IP address. Intrusion detection signaturesrequire correlations across packet dimensions. A simple exampleis a distributed SYN �ood attack. To keep track of the source IPand the SYN �ag, one would need a sketch that tracks these two�elds. However, such a sketch can now no longer be used to answerqueries about the IP or SYN �ag alone. There is no way to decouplethose two header �elds in the sketch. Thus, to create sketches todetect attacks based on any combination of TCP/IP headers (18header �elds), a need a total of 218 individual count-min sketches atevery monitor. Assuming each count-min sketch to be 500KB [44],this would translate to 128GB of information being transferred byeach monitor to a central location per measurement epoch, leadingto a prohibitively large communication cost. In addition, such abrute-force combinatorial sketching method would not leveragecorrelations between header �elds which may not be known to usa-priori and would only emerge through the actual tra�c gener-ated. Our proposed summarization scheme is able to identify andleverage those correlations automatically, by virtue of low-rankapproximation.

Packet Filtering

Summatization

Flow Assign.

Inference

NIDS

Summaries

Assignments

Decision

Load

Monitor

Packet Filtering

Summatization

Monitor

Info.

Load

Info.

Rules

Summaries

Assignments

Figure 1: Jaal architecture.

Key idea:We envision that by extracting lightweight in-network“packet summaries” that retain the information needed by a NIDS,from tra�c that �ows through monitors, we can solve the challengeof scale while retaining performance and accuracy. This is in essencethe key idea in building our framework Jaal.

Threat Model: While Jaal can handle all attacks that a NIDSlike Snort can handle, we limit the scope of our evaluations to onlytransport layer attacks. This is because, together they constitute themost widely seen attacks in the wild [4]. We omit attacks that willrequire us to examine the payload given that these days payloadsare often encrypted.

An IDS such as Snort also handles signatures relating to lower-layer attacks such as ARP scans. Since such attacks are local (linklevel and thus do not require global knowledge), we argue that theycan be detected using a local "lower layer attack detector" in eachlocal area network (LAN) independently. Thus, we do not try todetect these with Jaal.

We assume that monitors have already been placed in the net-work and their locations are static. Flexible monitor placement andmanagement is beyond the scope of this work.

3 SYSTEM OVERVIEWOur goal is to build a NIDS at ISP scale, with capabilities similar tothat of today’s modern NIDS (e.g., Snort [50] and Bro [48]) deployedin enterprises, based on the key idea described in § 2. Evidencecollected in the network and transferred to a detection enginemust consist of concise yet informative summaries (instead of rawpackets), and the detection engine must be able to process thesesummaries and provide inferences just as with Snort. One of thedriving principles behind designing Jaal was generality. We aim forthe techniques we develop to be equally amenable to detecting allattacks. Therefore, we make no assumption on the utility of anypacket header �eld and treat all header �elds as equally important.We follow modern IDS systems design which provide pipelines forquerying any header �eld. We choose Snort as our baseline since itis the most popular IDS used today [21].

We meet this goal by designing a modular ISP scale NIDS Jaal,which comprises three modules viz., summarization, inference, and�ow assignment (see Fig. 1).

Summarization module: This module runs on monitors in thenetwork. Each monitor extracts headers of traversing packets andconstructs a “summary” of these headers using a lower dimensional

I- Monitors:• Filter target flows• Process packet

batches, create summaries

II- Inference engine:• Collects summaries• Performs pattern

matching

III- Flow assignment:• Assigns flows to monitors• Load balancing

SummarizationGoal: produce a representative summary of packets

Enables high accuracy detection of attacks using general signatureLight weight, low BW overhead

10

all flows all flows

assigned flows

packets batch

batch summary

fields mode

packets mode

Summarization (cont.)Two step summarization

SVD to reduce fields modeClustering to reduce packets mode

11

batch

summarySVD k-means

p r

nk

centroids counts

CoNEXT ’17, December 12–15, 2017, Incheon, Korea A. Aqil et al.

representation. The summary is then forwarded to the analysis andinference module.

Inference module: Jaal contains a centrally located inferencemodule which, given a set of Snort-like, signature-based rules, trans-forms the rules into a format suitable for use with the summaries.Equivalent rules are proposed (again applied on summaries) fordetecting attacks that cannot be detected via simple comparisonswith packet signatures. Packet summaries collected from monitorsare compared with these new rules to detect threats.

Flow assignment module: The �ow assignment module seeksto assign �ows to monitors such that each �ow is monitored onlyonce and the maximum load across the monitors is minimized.This is important both for correct operation of the NIDS as well asfor saving bandwidth. The problem is mapped onto a constrainedload balancing problem and solved using a simple yet e�ectiveapproximation algorithm.

A detailed description of each module in Jaal follows in each ofthe next three sections.

4 PACKET SUMMARIZATIONThe goal of packet summarization is to produce a representativesummary of a batch of packets that: (a) enables the analysis anddetection of a wide class of attacks, and (b) allows Jaal to achievehigh detection accuracy with low communication overhead.

Henceforth, we refer to the packets and the packet �elds as thedi�erent modes of the data that we are summarizing. The numberof dimensions in those modes (i.e., the number of packets that havepassed through amonitor and the number of �elds in each packet) islarge; thus our goal is to reduce the dimension of both modes whilepreserving correlations between the dimensions of both modalities.

To support our goal, we employ dimensionality reduction, a fam-ily of techniques that approximate a dataset with high-dimensionalmodes, using a modi�ed dataset with signi�cantly reduced dimen-sions, while minimizing the approximation error. This reductionresults in a more compact, yet high �delity data representationwhich respects correlations in both modes of the data. We proposea practical two-step approach where each step focuses on e�cientlyreducing the dimensionality of one of the two modes of the data. Atthe end of the process, we create what we call in-network packetsummaries. Onemay envision a single-step approach to reduce bothmodes simultaneously; however this objective is computationallyhard from an optimization point of view. Our approach can achieveany desired accuracy, by trading o� communication cost. Specif-ically, the design parameters, determining the level of reduction,control the tradeo�s between the costs (i.e., the size of summariessent to the inference module) and the detection accuracy.

4.1 Packet �ltering and NormalizationWe assume that there are monitors in the network to aid intrusiondetection. Each such monitor �lters and processes packets from�ows (speci�ed by a four tuple viz., source and destination IP ad-dresses and port numbers) that are assigned to it (assignment of�ows to monitors is done by a central engine; this is discussed in§ 6). Transport and network layer headers are then bu�ered untilthe number of packets in the bu�er, regardless of which �ows theybelong to, is equal to some pre-determined threshold n. We call this

a batch of size n. Each batch of packet headers is organized in amatrix X with dimensions n ⇥ p. Each row represents the p �eldsin the (TCP and IP) headers of a given packet.

Before we construct packet-summaries, we create a normalizedversion of X denoted by X. In Jaal, packet header �elds are pro-cessed as vectors, and each header �eld is mapped to an entry in thevector. A measure of distance is used to decide whether two packetsare similar. Since the raw magnitudes of the header �elds valuesvary greatly, normalization of these values is needed to ensure thatthere is no bias towards �elds that vary over a larger range. For ex-ample, consider a vector with only the TCP SYN �ag and the sourceIP address �elds. Without normalization, the distance computationswill be heavily dominated by variations in the IP address �eld. Thus,for any header �eld with value x � 0, we apply the transformationx = x

max(x ) , where max(x ) is the maximum possible value for thatheader �eld (x ). Thus, we have 0 x 1,8x .

4.2 Dimensionality reduction: �elds modeWe �rst apply Singular Value Decomposition (SVD), a dimensional-ity reduction technique, on the packet �elds in X to reduce its rank.By reducing the rank, we form a smaller-size representation of Xwhich provably approximates the original matrix well [40]. First, Xis decomposed to

X = U�VT , (1)

where the columns of U are the left singular values of X, columnsof V are the principle axes (or directions) of data in X, and VT is thetranspose of V. � is a diagonal matrix with the singular values �i ofX in descending order. The number of non-zero singular values isthe rank of X, and the sum of squares of �i represents the amountof data variation in X.

In practice, many data matrices exhibit a latent rank much lowerthan the actual rank measured by the number of non-zero singularvalues, or rather, by the number of linearly independent columns inthe matrix. Intuitively, the reason is that, in practice, many columns(i.e., header �elds of the packets we store in the matrix) are notexactly linearly dependent but they may be highly correlated andthus, approximately linearly dependent. When this happens, wecan reduce the size of the data by approximating the matrix usingits latent rank, which is smaller than the observed rank.

Thus, in order to reduce the size of data sent for analysis andinference, while minimizing the approximation error (in the leastsquares sense), a lower rank representation of X is produced bykeeping only the largest r p singular values �i of X and settingthe remaining ones to zero, to get

Xp = U�pVT . (2)

It is provable that Xp is the optimal rank-r approximation of X,in the sense of minimizing the Frobenius norm of (X � Xp ) [36].Here, multiple �i s may have small values, indicating that the latentrank of the matrix is smaller than the observed one. In that case,the dimensionality of X can be reduced, while still retaining a highpercentage of the information in the data by, for example, removingthe smallest �i s such that the sum of the squares of the retainedsingular values is at least 90% of the sum of the squares of allsingular values.

fields mode

packets mode

eliminate small singular values

Inference

12


In (2), all matrices have similar dimensionality as their counter-parts in (1). However, since the last p�r of singular values are set tozero, the corresponding columns in U and V can now be removed.This new representation is the truncated SVD representation of Xand is equivalent to Xp . It can be written as follows:

Xr = Ur �rVTr , (3)

where the dimensions of Ur , �r ,Vr are now (n ⇥ r ), (r ⇥ r ) and(p ⇥ r ), respectively. In our system, the design parameters helpdecide which representation, i.e., Xr or Xp , is more e�cient asdiscussed at the end of § 4.3.

4.3 Dimensionality reduction: packets modeNext, we seek to further reduce the size of the summary producedby each monitor via an even more compact representation of thepackets in the reduced rank space. To do so, we seek to reduce the di-mensionality across packets. We pose the problem to one similar tosignal quantization, where the goal is to identify a set of representa-tive values of a digitized signal which minimizes the approximationerror. Thus, we seek to �nd a set of representative packets R thatcan approximate Xp (or Xr ). Ideally, R is constructed such thatpackets that are similar are mapped to the same representation. Letthe size of the set of representative packets be |R | = k . Formally,our problem can be posed as:

minR,B2 {0,1}n⇥k T RS

kXTr � RBT k2F , (4)

where the RS constraint requires each row of B to sum up to 1,and the columns of matrix R are “centroids”, e�ectively containingthe packets R, and matrix B is an assignment of each packet toa centroid. We use the Frobenius norm (i.e., Euclidean distancebetween a packet and its centroid) because (i) we have no priorknowledge about the distribution of packet vectors and, (ii) theproblem admits very e�cient approximations under this norm.

The above problem is known as Vector Quantization or K-meansclustering and it is NP-hard in the above form [24]. However, thereexist very e�cient approximations when using the Frobenius normas the loss function, with the most widely used being Lloyd’s al-gorithm [45]. In Jaal, we employ the “k-means++ algorithm” [26],an improvement upon Lloyd’s algorithm that seeks to �nd a goodinitialization for the algorithm to converge faster on a good localminimum. We use this clustering algorithm since it is guaranteed to�nd a solution that isO (logk )-competitive to the optimal clusteringsolution, and is known for fast convergence.

Note that R has k packet representatives; each represents agroup (i.e., cluster) of similar rows in Xp . Increasing k increasesresolution (more representatives) but increases the communicationcost as well. We study how varying k a�ects detection accuracy fordi�erent attacks in § 8.

We propose two methods for processing and sending summariesto the inference engine. In the �rst, we build a combined summary,by applying the clustering algorithm on Xp . The output consists ofk centroids Xp , one for each cluster, as well as clustering metadata.The latter contains a membership counts vector c, and is appendedto Xp to form Sm1 , which is then sent to the analysis and inferencemodule. It is easy to see that the number of elements of Sm1 is thusk (p + 1), for each update from monitorm.

TranslatorNIDS

question

Similarity

SummarySm1 or Sm

2Config.⌧d, ⌧c

Alert,

Q

Postprocessor

Alert

h, ⌧vConfig.

QEstimator

Aggregator Sa

Rules

vectors

q

Inference Engine

Feedback

Figure 2: Inference Module.

In the second method, we create what we call a split summary.Here, we apply the clustering algorithm on Ur . The output consistsof the k centroids Ur as well as the corresponding metadata c ande. The summary in this case is the collection Sm2 = {Ur , �rVTr , c}.From (3), and since �r is diagonal and can be sent as a vector ofsize r , the number of elements in Sm2 is r (k + p + 1) + k .

Note that the information compiled in Sm1 is equivalent to thatin Sm2 , but is represented in a di�erent format. More importantly,as discussed above the communication cost is di�erent with eachmethod depending on the design parameters r and k . In particular,when r (k +p + 1) +k < k (p + 1), our system employs the split sum-mary method and sends Sm2 to the analysis and inference engine;otherwise, Sm1 is sent.

5 ANALYSIS AND INFERENCEJaal includes a centralized inference module. While this notion ofcentral inference is similar to that with Snort or Bro, the inferencesare based on summaries instead of raw packets. In brief, the in-ference module contains a translator that converts traditional IDSrules (speci�cally Snort rules) to a format that can be used withsummaries. The inputs to this translator are “aggregated summaries”that are single consolidated representations of the summaries re-ceived from all monitors; they characterize all �ows traversing thenetwork in a given period of time. Then, the consolidated summaryis checked against each transformed rule to detect attacks. Fig. 2shows the di�erent blocks in the inference module.

5.1 Aggregating summariesSummaries generated by the di�erent monitors need to be aggre-gated to get a global view. There are two ways to fetch summariesfrommonitors. In the �rst, monitors periodically send summaries tothe controller. In the second, when a monitor accumulates a batchsize (n) of packets, it constructs and ships the summary of this batchto the controller. At this point, the controller requests every othermonitor to send its summary. In response, all monitors except thosewith fewer than nmin packets, send their summaries. Summarizinginformation using less than nmin packets incurs accuracy penaltiesbecause clustering and SVD generally do not perform well whendata dimensionality is small. However, as shown in § 8, nmin isvery small and not of concern in high speed networks.

Let the number of available monitors in the network beM . Theaggregated summary, Sa =

fXa | ca

g, with centroids Xa and mem-

bership counts ca , is composed by concatenating the summariescollected from all monitors in a tall matrix format. In particular,when monitorm sends its summary in the form Sm1 , it is appended

Individual summaries

Collect individual summaries (push or pull)Transform NIDS rules (normalization, marking irrelevant fields)Estimate similarity

Feedback: request finer grained summary to improve performanceEstimate variance (e.g. port scans, DDoS)

Flow AssignmentRequirements:

Cover all flowsEach flow is processed by exactly one monitor (for correct operation)Balance load to the extent possibleSimple/Fast algorithm

Challenge:Flows can start/terminate at any time, vary in packet ratePacket rates unknown a priori

Solution: Model as constrained online load balancing problemSimple greedy algorithm, (empirically) close to optimal

13

EvaluationImplemented on in-house high performance SDN-testbedTwo Realistic RocketFuel topologies (~350 routers)

Complex topologies created by instantiating Open vSwitchesinstances connected via virtual links

Two ISP backbone traces from MAWI group as background traffic + inject malicious traffic

Five different attacks:DoS: SYN floodDDoS: distributed SYN floodPort scans: distributed port scansBrute forcing: distributed SSH brute forcingSockstress

14

most common

attack classes

Evaluation (cont.)

r = 14 retains most information in fields moden ≥ 600, k ≥ 0.2n, r ≥ 12 enables high detection accuracy

15

98% average TPR @ 9% FPR and only 35% BW overhead (with feedback)Summarization parameters n, k, rset by studying ROC curvesEach point has a different n: batch size, r: rank, k: centroids

Evaluation (cont.)Simulating Mirai progression

Scanning on ports 23, 2323Randomly select a source node + 150 vulnerable nodesJaal detects the scan with 95% accuracy

16



TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 1000, r = 12

(a) k = 100


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n =1000, r = 12

(b) k = 200


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 1000, r = 12

(c) k = 400

Figure 4: ROC curves for various attacks. Batch size = 1000, rank = 12, varying k, Trace 1.


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 2000, k = 500

(a) r = 10


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 2000, k = 500

(b) r = 12


TPR

0

0.2

0.4

0.6

0.8

1.0

FPR0 0.2 0.4 0.6 0.8 1.0

n = 2000, k = 500

(c) r = 15

Figure 5: ROC curves for various attacks. Batch size = 2000, k = 500, varying rank, Trace 1.

Communication overheadTrue positive rate

Perc

enta

ge

40

60

80

100

FPR10 20 30 40 50 60 70 80 90 100

Increase in TPR and communication overhead

Figure 6: The increase in TPR and commu-nication overhead as the acceptable FPR isincreased.

Avg decrease in throughputWorst decrease in throughput Drop in accuracy

Perc

enta

ge d

ecre

ase

0

50

100

Percentage of traffic replicated0 10 20 30 40 50 60 70 80 90 100

Performance degradation as traffic replication increases

Figure 7: Performance degradation as thepercentage of tra�c that is replicated in-creases.

Unchecked infectionsRemaining infected devices after Jaal

Num

ber o

f inf

ecte

d de

vice

s

0

50

100

150

Time (s)0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34

Number of Infected devices vs time

Figure 8: Mirai Attack: Unchecked infec-tions versus infected devices shut o� upondetected by Jaal.

GreedyRobin HoodRandom

Load

Rat

io (m

in/m

ax)

0

0.2

0.4

0.6

0.8

1.0

Groups0 2 4 6 8 10 12 14

Load for different monitoring groups - Topology 1

Figure 9: The load across di�erentmonitorgroups for topology 1. Figure 10: Themagnitude of the sin-

gular values for a packet matrix ofn = 1000.

Figure 11: Percentage savings vs.the batch size for �xed error rates.

use the popular Nmap tool [16] to conduct scans. Nmap has a listof ports that it scans and we simply use those defaults.

Testbed and con�guration:We implement Jaal on an in houseSDN testbed that consists of 5 Dell PowerEdge 730 Servers (22 cores,256GB memory, 12 ports) and two HP 3800 series switches (SDNenabled each with 48 ports) and 2 Arista 7280E switches (each with72 ports). Note that we use SDN for the purposes of evaluating Jaal;it is not a necessary platform for the deployment of Jaal.

To represent the network, we use two realistic RocketFuel topolo-gies [55] in our evaluations viz., Abovenet (which we call topology

1) and Exodus (which we call topology 2). Topology 1 has 367routers and topology 2 has 338 routers. We create the topologiesby instantiating the desired number of open vSwitch instances andconnecting them using virtual links to match the con�guration ofthe chosen topology. Our virtual topology can handle Gigabit tra�ce�ectively, where the underlying SDN switches have 10GBE ports.

8.1 Detection Accuracy and OverheadOverall results: In a nutshell, our experiments show that for all theattacks considered, Jaal achieves an average true positive rate (TPR)

ConclusionISP scale NIDS is needed in the face of large scale attacksState of the art NIDS inadequate at ISP scaleJaal presents a major step forward in developing ISP scale NIDS

Uses dimensionality reduction and clustering Centralized pattern matching on packet summariesAchieves high detection accuracy at low bandwidth overhead

17

Thanks!

18

jaal: towards network intrusion detection at isp...

Documents