Mitigating the Insider Threat using High-dimensional Search and Modeling
Presenter:Eric van den [email protected], July 13, 2005Team:Shambhu Udpadhyaya, Hung Ngo(SUNY Buffalo)Muthu Muthukrishnan, Raj Rajagopalan (Rutgers)
DARPA IPTO Program:
Self Regenerative Systems (SRS)
Program Manager: Lee Badger
PI: Eric van den Berg
SRS PI meeting July 2005 – 2
Project overview
Project goal: to build a system that defends critical services and resources against insiders, which
– Correlates large numbers of sensor measurements– Synthesizes appropriate pro-active responses
What is done today?– Reactive systems: Detect attacks late in cycle– Anomaly detection systems: Few streams for correlation– Human-based systems: not scalable– Collateral damage may be large
SRS PI meeting July 2005 – 3
Project overview (continued)
Technical Approach– Large network of sensors, to let insider trigger alerts– High dimensional network state description using sensor alerts– Search engine finds top-K past states similar to sensor
snapshot– Insider modeler and analyzer tool used to identify attack points,
train search engine, guide sensor placement– Response engine to analyze impact on critical services and
synthesize reconfiguration response
Technical Challenges– Testing SVD-based search technology in a new domain– New ‘Insider analyzer’ key-challenge graph problem is hard– Training search engine, labeling and annotating states
SRS PI meeting July 2005 – 4
Project overview (continued)
Quantitative Metrics to measure success and overheads– False alarm / detection rate– Test detection for novel variations of known attacks
Major Achievements to date– Initial prototype for sensor network– Initial prototype for SVD-based search engine– Initial prototype for Insider modeler and analyzer tool– First test with independent ‘insiders’
Task(milestone) Jan-Jun 05
Jul-Dec 04
Design (document) Prototyping (software) Testing (report)
Jul-Dec 05
SRS PI meeting July 2005 – 5
Sensor
Sensor
Sensor
Sensor
Normalizer
Filter
Aggregator
SearchEngine
ResponseEngine
Network StateRepository
HostScans
AuditScans
StateData
TrafficMeasure-
ments
Reconfiguration
Top KList
RefinedQueriesN
ETWORK
High-DimensionalSearch
Insider Modeler
andAnalyzer
Organizationaldata
Labels andfilters for states
Post-processing
Architecture
SRS PI meeting July 2005 – 6
Insider analyzer and modeler
Insider threat manifests in two forms:– Insider abuse while staying within legitimate privileges– Insider abuse while exceeding assigned privileges
Focus on an insider's view of an organization: hosts, reachability and access control
A new threat model called a “key challenge graph”– Similar to attack graphs, less emphasis on details– Allows static analysis of insider threat
More in papers
Threat analysis metricCost of AttackActual targetTarget Vertex
Location of insiderStarting VertexAccess ControlKey Challenge
Information, CapabilityKey
Connectivity, ReachabilityEdgeHosts, PeopleVertex
AbstractionModel Component
SRS PI meeting July 2005 – 7
Insider modeler and analyzer MAPIT tool architecture
Network entity rules
Cost Rules
MAPIT EngineNetwork topology
Key challenge graph
Vulnerabilities
Authentication mechanism
Social Eng . Awareness
Sensitivity analysis
Defense centric
analysis
SRS PI meeting July 2005 – 10
Sensors to detect insider attacks
Detect changes from user ‘normal behavior’– Profile anomaly detector– Statistical sequential change point detection– Future: biometrics, e.g. keystroke dynamics?
Detect access to target resources– Pluggable Authentication Module, File integrity
checker Other useful sources:
– web, audit logs (e.g. internal website searches)– network intrusion detectors (signature, anomaly)
SRS PI meeting July 2005 – 11
Network traffic anomaly detector
Streaming data model– Large data volume and speed: in backbone 1 billion
packets/hour/router– Large data domain: IPv4: 2^32 addresses, IPv6: 2^128
Consequences: – Can scan data (at most) once– Need small-space structure to summarize data
Hard to store O(n) data points when n=2^32 Cannot store at 2^128
Idea: build synopsis data structure for IP-packets– CM-sketches, deltoid group-testing
Detect attacks based on changes in traffic volume– Currently: traffic to destination IP address (likely targets)
Can detect attacks exhibiting large changes in packet distribution
SRS PI meeting July 2005 – 12
Example: Network anomaly detector
Based on week 2 of 1999 MITLL data– from inside sniffer
Traffic volume based anomaly detection– Ipsweep, portsweep, phf, httptunnel, etc.
Detects targets of all four above attacks– Does give additional big changes ~1%, not attacks
Search engine to filter out non-attacks
SRS PI meeting July 2005 – 13
Sensor alert message format
We use IDMEF (Intrusion Detection Message Exchange Format) to transmit and store sensor alerts
– Between sensors and database– Between search engine and response engine
Alert storage in mySQL database with IDMEF-based schema
SRS PI meeting July 2005 – 14
Network state description
Network state is constructed from sensor alerts:– Accommodate heterogeneous sensor types– Account for different sensitivity of sensor types– Tolerate possibly delayed or missing, ‘out of order’ alerts
Alerts are mapped to a high-dimensional vector for search– Coordinates correspond to different sensor-alert types– Some possibilities for mapping values:
Total number of sensor alerts of given type in (sliding) time window Indicator: sensor alert occurred in (sliding) time window
Network state is labeled:– With Classification e.g. ‘Normal’, ‘Insider’– With Response for Response Engine
SRS PI meeting July 2005 – 15
High-dimensional search engine
Goal: Find historical documented network states most similar to the current network state snapshot
Output: Top-K list of ranked/prioritized similar states Ranking can be based on similarity metric, or
– potential impact, e.g. attack ‘risk’ Impact of historical network states is documented,
– impact of current state analyzed with Response engine Search engine reduces search space dimensionality
– Using Singular Value Decomposition, or random projection
Similar states found by nearest neighbor search– distance metric: e.g. cosine similarity, Euclidean distance
SRS PI meeting July 2005 – 16
Ranking via alert correlation
Combine alert information from network and host sensors
Segment alert state vector to reflect activity by host and user
Reinforce or weaken ‘attack’ hypothesis Useful as component to detect or visualize
specific attack patterns (moving from host to host)
SRS PI meeting July 2005 – 17
SVD-based anomaly detection
Statistical Methods using ideas from Principal Component Analysis (PCA)
– Imagine alarm vectors come from multivariate normal distribution
– Compute sample mean, covariance / correlation matrix for training data
– Eigenvalue decomposition of covariance matrix to separate data into normalized independent components
SRS PI meeting July 2005 – 18
anomaly detection (cont.)
Test new vector of alarms– Check for alarms not in training data– Check for fit to training distribution
Status– Code ready– Still to determine thresholds
How far to use normality assumptions vs. switching to nonparametric methods
SRS PI meeting July 2005 – 19
Early detection of insider attacks
How to represent time evolution in multi-stage attacks? Like learning attacks from documented historical network
states, we can also document attack precursors or attack
stages – Full attack now represented as a sequence of network state
vectors– Robust against slow attacks: no explicit dependence on
time– Would like to make ‘precursor’ annotation (semi-) automatic
Approaches to automatic precursor annotation– Temporal precursors– Spatial precursors
SRS PI meeting July 2005 – 21
Impact Analysis using Response Engine Building upon Smart Firewalls technology from Dynamic
Coalitions program; Response Engine– Has overview of current network configuration– Logically validates Policies, expressed in terms of end-to-end
service availability– Generates candidate reconfigurations to comply with Policies as
much as possible In this project
– Detected attack type and location is translated into its effect on the stated policies and current network configuration
– E.g. Server failure due to a Denial of Service attack Response Engine can analyze the impact of both the attack and
its candidate responses on the availability of critical resources– E.g. Analyze impact of vulnerability exploit: how widespread is the
vulnerability? Administrator can push response into the network
SRS PI meeting July 2005 – 22
Response using policy-based architecture
Policy
ResponseEngine Topology
High-level PolicyConfiguration
SummarizedConfiguration
Routers &Switches
Co
ntr
ol
&M
on
ito
r
Device-level PolicyConfiguration
DetailedConfiguration
Security Policy Adaptors
Correlated Alerts
SRS PI meeting July 2005 – 23
First test results
First system test by independent insiders Goal: extract operations-sensitive military
information Four volunteer ‘insiders’ given
– existing account information– starting location and– nature of target
Result: 3 out of 4 attackers detected Program goal: delay / thwart 10% of insider
attacks
SRS PI meeting July 2005 – 24
Next / future steps
Show effectiveness against wide range of attacks Measure false positive rate Adapt detection system to heterogeneous
environments
SRS PI meeting July 2005 – 26
SVD-based search on Test alert set
Attacks from MITLL Scenario Specific datasets Alerts from NC-State (TIAA)
– Generated by Real-Secure IDS on MITLL attack data sets Scenario 1:
1. IP sweep2. Probe for sadmind vulnerability3. Break-in via sadmind exploit4. Installation of mstream DDoS software5. Launch DDoS attack
Scenario 2:1. Probe of DNS server via HINFO query2. Break-in via sadmind exploit3. FTP upload of mstreamDDoS software and attack script4. Initiate attack on other hosts5. Launch DDoS attack
SRS PI meeting July 2005 – 27
Top-7 most similar states, lower = more similar
inside1 against inside1
-2.00E-01
0.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
1.40E+00
1.60E+00
11/10/20013:21
11/10/20013:50
11/10/20014:19
11/10/20014:48
11/10/20015:16
11/10/20015:45
11/10/20016:14
11/10/20016:43
11/10/20017:12
11/10/20017:40
query time
dis
tan
ce
normal 1
IP sweep
probe
sadmin exploit
DDoS install
DDoS start
normal 2
SRS PI meeting July 2005 – 28
Most similar states and attack phases
inside1 against inside1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
11/10/20013:21
11/10/20013:50
11/10/20014:19
11/10/20014:48
11/10/20015:16
11/10/20015:45
11/10/20016:14
11/10/20016:43
11/10/20017:12
11/10/20017:40
query time
dis
tan
ce
normal 1
IP sweep
probe
sadmin exploit
DDoS install
DDoS start
normal 2
SRS PI meeting July 2005 – 29
Scenario 2 tested on Scenario 1 history
inside2 against inside1
-2.00E-01
0.00E+00
2.00E-01
4.00E-01
6.00E-01
8.00E-01
1.00E+00
1.20E+00
1.40E+00
1.60E+00
11/9/200115:21
11/9/200115:36
11/9/200115:50
11/9/200116:04
11/9/200116:19
11/9/200116:33
11/9/200116:48
11/9/200117:02
11/9/200117:16
11/9/200117:31
query time
dis
tan
ce
normal 1
IP sweep
probe
sadmin exploit
DDoS install
DDoS start
normal 2
SRS PI meeting July 2005 – 30
Most similar states in Scenario 1
inside 2 against inside1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
11/9/200115:21
11/9/200115:36
11/9/200115:50
11/9/200116:04
11/9/200116:19
11/9/200116:33
11/9/200116:48
11/9/200117:02
11/9/200117:16
11/9/200117:31
query time
dis
tan
ce
normal 1
IP sweep
probe
sadmin exploit
DDoS install
DDoS start
normal 2