the mathematics and algorithmics of process detection george cybenko dartmouth college hanover nh...

58
The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA [email protected] IPAM 27-7-2005 Cybenko

Upload: dustin-reynolds

Post on 01-Jan-2016

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

The Mathematics and Algorithmics of Process DetectionGeorge Cybenko

Dartmouth CollegeHanover NH 03755 USA

[email protected]

IPAM 27-7-2005Cybenko

Page 2: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Acknowledgements

Research Support: DARPA, DHS, ARDA, ISTS, I3P, AFOSR, Microsoft

Active Members

George BakosAlex BarsamianMarion BatesVincent BerkWayne ChungValentino Crespi (Cal State LA)George CybenkoIan deSouzaAnnarita GianiDoug MadoryGlenn NofsingerYong ShengWilliam Stearns

Alumni

Naomi Fox (UMass, Ph.D. student) Hrithik Govardhan (Rocket)Robert Gray (BAE Systems)Diego Hernando (UIUC, Ph.D. student)Guofei Jiang (NEC Research)Alex Jordan (BAE Systems)Han Li (China)Josh Peteet (Greylock Partners)Chris Roblee (LLNL)

IPAM 27-7-2005Cybenko

Page 3: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Outline

Background and basics Background and basics Software and ApplicationsSoftware and ApplicationsTheoryTheorySummarySummary

IPAM 27-7-2005Cybenko

Page 4: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

IPAM 27-7-2005Cybenko

An Example of a Process

1 2A “Process” Model

a b

Two states - { 1 , 2 } Two observables – { a , b }

Legal transitions between states are depicted by arrows.

When occupying a state, the process emits an observable.

All states are initial/start states and there are no terminal states.

Some legal sequences of observables: abbab , bababbb, abbb

Some illegal sequences of observables: aa , baab

Further reading: Automata Theory, Regular Languages, etc

Page 5: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

IPAM 27-7-2005Cybenko

A More Complex Process

1 2Another “Process” Model

a , c b

Three states - { 1 , 2 , 3 } Three observables – { a , b , c }

Some legal sequences of observables: abab , babaccab, ab

Some illegal sequences of observables: bb , baabb

Problem: Given a sequence of possible observations is it legal? What states?

Solution: 1 Read the first observable, mark states that emit that observable2 Read an observable, z3 New marked states = (states reachable from old marked states)

intersected with (states that could have emitted z )4 If no new marked states, illegal sequence; else go to 2

3

a , c

Page 6: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

IPAM 27-7-2005Cybenko

Two Simple Processes

A1 A2Model Instance A

a b

aabb is a legal observation sequence

A1 B1 A2 A2 , A1 B1 A2 B2 , B1 A1 B2 B2 , ... are all legal state sequences

A1 A2 A2 , A1 A2 , A1 B1 B1 B2 B1 B2 B2

We can reduce this to a single process....

B1 B2Model Instance B

a b

a track

a hypothesis

Page 7: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

IPAM 27-7-2005Cybenko

Multiple Process Representation

A1 A2Model Instance A

a b

a b

A1 A2Model Instance A

B1 B2Model Instance B

a b

0 11 1

A1 B1

A1 B1 M =

M x M =

0 00 0

0 11 1

0 11 1

0 11 1

If the observation sequence is aaaaaa and multiple copies of the model are allowed, then we get a product model of size 2n.

Page 8: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Multistage Process Model

Start/Normal

Scanned

Infected

Data Access

Exfiltration

Potential malicious activity

Potential normal activity

IPAM 27-7-2005Cybenko

Page 9: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

IPAM 27-7-2005Cybenko

k copies

t=0 t=1 t=2 t=3 t=k-2 t=k-1Copies of

states

Take logsof probabilitiesso this is a shortest pathproblem andcan usedynamic programming(Viterbi algorithm)

Extensions: Hidden Markov Model (HMM)Extensions: Hidden Markov Model (HMM)

1 2

p(a|1) = 0.8 , p(c|1) = 0.2 p(b|2) = 1 p(a|3) = 0.8, p(c|3) = 0.2

3Add probabilities1 0.8

0.5

0.2 0.5

Page 10: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Hidden Process Models

Underlying(hidden)state spaces

Model 1 Model n

a, b

a, c

c, d

e

f, c c, d

h

f, g

a b c d a b b a d a c f h c c g d gObservationsrelated to statesequences

Observationsare interleaved

a b c c f h d cc a b g d b a g d a

Observations missed,noise added, unlabelled(This is what we see)

a b a c f k h d c b g d b k h a g d a

IPAM 27-7-2005Cybenko

Page 11: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Terminology and SummaryProcesses have states.

The states are hidden.

States emit observables that are possibly not unique to a state.

Observables are not labeled, can be noisy and might be dropped.

Multiple processes might be instantiated.

The problem is to determine which processes are possible and which states those processes can be in.

Multiple process detection can be reduced to single process detection at the expense of exponential growth.

Tracks are associations of observations to processes.

Hypotheses are consistent tracks that explain all the observables.

IPAM 27-7-2005Cybenko

Page 12: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Discrete Source Separation Problem(viz Blind Source Separation, “Cocktail Party” Problem)

3 states + transition probabilitiesn observable events: a,b,c,d,e,…Pr( state | observable event ) given/known

Observed event sequence:….abcbbbaaaababbabcccbdddbebdbabcbabe….

Catalog ofProcesses/Models

Which combination of which process models “best” accountsfor the observations? This is what we want to compute. Events

not associated with a known process are “anomalies”.

Process/Model Example:

A Track

A Hypothesis

IPAM 27-7-2005Cybenko

Page 13: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

A Simple Example of Process Detection

A B C D

{ a } { b } { b , c } { c , d }

a,b,c,d are events that can be observed

E F

{ a } { b }E,F = 0repeat

read event eif e==a then Eif E and e==b then F

until F

NETWORK WORM MODEL (NW)(a,b,c,d ICMP traffic levels)

ROUTER FAILURE MODEL (RF)

Two models; states have different semantics; sets of observables intersect – what is the “diagnosis”?

• a,b,c,d are events that can be observed• states A, B, C, D, E, F are hidden

• observe a sequence of events

Sequence Hypotheses• ab NW | RF• abab (NW & NW)|(RF&NW)...• ababc (NW & RF)|(NW & NW)• ababcc NW & NW

• Which process or combination of processes explains the observed events?

IPAM 27-7-2005Cybenko

Page 14: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Detecting a Process Using Rules

A B C D

{ a } { b } { b , c } { c , d }

A,B,C,D = 0repeat

read event eif e==a then Aif A and e==b then Bif B and (e==b or e==c) then Cif C and (e==c or e==d) then D

until D

E F

{ a } { b }E,F = 0repeat

read event eif e==a then Eif E and e==b then F

until F

WORM MODEL(a,b,c,d ICMP traffic levels)

ROUTER FAILURE MODEL

What does “ab” mean ? (Process ambiguity)What does “ac” mean ? (Missed Detections)

IPAM 27-7-2005Cybenko

Page 15: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Rules for Process Disambiguation

A B C D

{ a } { b } { b , c } { c , d }

A,B,C,D = 0repeat

read event eif e==a then Aif A and e==b then Bif B and (e==b or e==c) then Cif C then (E=0, F=0)if C and (e==c or e==d) then D

until D

E F

{ a } { b }E,F = 0repeat

read event eif e==a then Eif E and e==b then F

until F

WORM MODEL(a,b,c,d ICMP traffic levels)

ROUTER FAILURE MODEL

Cannot decide which process is instantiated until more data arrives.

IPAM 27-7-2005Cybenko

Page 16: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Rules for Missed Detections

A B C D

{ a } { b } { b , c } { c , d }

A,B,C,D = 0repeat

read event eif e==a then Aif A and e==b then Bif A and e==c then C,Dif A and e==d then Dif B and (e==b or e==c) then Cif C then (E=0, F=0)if C and (e==c or e==d) then Dif D then (E=0, F=0)

until DWORM MODEL(a,b,c,d ICMP traffic levels)

This clearly does not scale and does not lead to manageable sets/systems of rules.

IPAM 27-7-2005Cybenko

Page 17: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Complexity of Rule-Based Systemsfor Multiple Process Detection

mm process models, each with process models, each with nn states states Potentially as few as Potentially as few as mnmn state transitions in the state transitions in the

original modelsoriginal models Potentially need to add:Potentially need to add:

O(O(mm22nn22) rules for disambiguation ) rules for disambiguation O(O(mnmn22) rules for missed detections) rules for missed detections these are “overhead” processing steps that can be these are “overhead” processing steps that can be

done generically, not by the decision tree or rule setdone generically, not by the decision tree or rule set Process Query System software handles this Process Query System software handles this

overhead processingoverhead processing

IPAM 27-7-2005Cybenko

Page 18: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Approaches to Detecting Processes

AristotelianAristotelian - Traditional information retrieval is based on - Traditional information retrieval is based on specification of a query in terms of Boolean expressions specification of a query in terms of Boolean expressions based on record fields. IE. SQL ( name = “smith” & age > based on record fields. IE. SQL ( name = “smith” & age > 20 & age < 40 ) + rule-based logics + decision trees, etc20 & age < 40 ) + rule-based logics + decision trees, etc

NewtonianNewtonian - Next generation process detection requires - Next generation process detection requires retrieval based on specification of a set of discrete, retrieval based on specification of a set of discrete, dynamic processes. IE, descriptions of a Hidden Markov dynamic processes. IE, descriptions of a Hidden Markov Model, Hidden Petri Net, weak models, FSMs, attack Model, Hidden Petri Net, weak models, FSMs, attack trees, etc. trees, etc.

Main Concept: Move from an Main Concept: Move from an AristotelianAristotelian to a to a NewtonianNewtonian Paradigm. Paradigm.

IPAM 27-7-2005Cybenko

Page 19: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Examples of Process Detection Problems Is there an unusual pattern of computer network events, host activities, system calls,

etc? (Network and computer security) Is a complex infrastructure (telecom, electricity, financial networks) operating normally

or in a failure mode? (Critical Infrastructures) Is my software operating normally? (Autonomic computing) What biological pathways/processes are engaged? (Molecular Biology) Is there an unusual pattern of document accesses within an enterprise document

control system? (Insider Threat Detection) Does a group of unusual transactions constitute a threat? (Homeland Security) Has the physical border/perimeter been breached? (National and industrial physical

intrusion detection) Is there a large ground vehicle convoy moving towards our position? (Tactical military) What’s going on around me? (Human Cognitive Processing)

IMPORTANT – All are “adversarial” situations, not cooperative, so the observations are not necessarily labeled for easy identification and association with a process!

IPAM 27-7-2005Cybenko

Page 20: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Related Disciplines

Underlying Underlying ModelModel

FiniteFinite

StateState

MachinesMachines

MarkovMarkov

Chains,Chains,

Shannon Shannon ChannelChannel

x’ = Ax + Bux’ = Ax + Bu

y = Cx + Dvy = Cx + Dv

u, v Gaussianu, v Gaussian

noisenoise

AnyAny

AlgorithmsAlgorithms

for Singlefor Single

ProcessesProcesses

State State marking, egmarking, eg

ViterbiViterbi

algorithmsalgorithms

Kalman FilteringKalman Filtering Not applicableNot applicable

MultipleMultiple

ProcessesProcesses

ProcessProcess

QueryQuery

SystemsSystems

ProcessProcess

QueryQuery

SystemsSystems

ProcessProcess

QueryQuery

SystemsSystems

Multiple Multiple HypothesisHypothesis

Tracking (MHT)Tracking (MHT)

“Weak”Models

HiddenMarkov Models

Linear StateSpace Systems

Multiple Target Tracking

IPAM 27-7-2005Cybenko

Page 21: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Software and Applications Sensor networksSensor networks

Airborne plume detectionAirborne plume detection

Cyber securityCyber security

Server pool managementServer pool management

Dynamics of social networks*Dynamics of social networks*

Genomics and biological Genomics and biological pathways*pathways*

Human situation awareness*Human situation awareness*

*In process or planned.*In process or planned.

IPAM 27-7-2005Cybenko

Page 22: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Process Query Systems (PQS)

Process Query Systems solve the Discrete Process Query Systems solve the Discrete Source Separation Problem in a generic way:Source Separation Problem in a generic way: inputsinputs

a sequence of unlabelled observations (stream, logfiles, etc)a sequence of unlabelled observations (stream, logfiles, etc) a collection of process modelsa collection of process models

outputsoutputs estimates of which processes produced those observations estimates of which processes produced those observations estimates of which states those processes are inestimates of which states those processes are in

Basic theory and technology has been developed Basic theory and technology has been developed by the PQS team at Dartmouthby the PQS team at Dartmouth

Now being applied to a variety of applicationsNow being applied to a variety of applications

IPAM 27-7-2005Cybenko

Page 23: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Algorithms/Operations of PQS

Recursive in Time

Track

Track

Tracks

Tracks

Track

Track

Track

Track

HypothesisPool

Hypothesis 1

Hypothesis n

Track

Track

Tracks

SubscribedData

Arrives2

Update Tracks WithinHypotheses (Viterbi / Kalman /

NDFA,etc) and Create New Hypotheses

Track

Track

Track

Track

3

Tracks

Tracks

Tracks

Tracks

Tracks

Track

Track

Tracks

Tracks

Tracks

ManageHypotheses

(MHT)4

Build or Learn

Models1Evaluate

Solutionsand

Process Outputs

5

IPAM 27-7-2005Cybenko

Page 24: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

DISCUS

Vehicle Tracking

Software: Process Query System One platform, many applications

Generic Process Query System

PQSnet.net

Computer Security

ARDA

DARPA

Cyberlog Analysis

Attacks on utilities

DHS

Plume detection

Sensor networksRobust Server

Pooling DHS

DHS

IPAM 27-7-2005Cybenko

Page 25: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

…application logic statement 1;application logic statement 2;file management statement 1;record management statement 1;file management statement 2;record management statement 2;application logic statement 3;record management statement 3;file management statement 3;application logic statement 4;…

…application logic statement 1;application logic statement 2;SQL statement 1;application logic statement 3;SQL statement 2;application logic statement 4;…

…file management operation 1;record management operation 1;file management operation 2;record management operation 2;record management operation 3;file management operation 3;…

Pre-SQL Programs Post-SQL Programs

Interwoven logicApplication logic Database management system

+

…model logic statement 1;model logic statement 2;sensor access statement 1;state estimate statement 1;sensor access statement 2;state estimate statement 2;model logic statement 3;sensor access statement 3;state estimate statement 3;model logic statement 4;…

…model description statement 1;model description statement 2;model description statement 3;model description statement 4;…

…sensor access statement 1;state estimate statement 1;sensor access statement 2;state estimate statement 2;sensor access statement 3;state estimate statement 3;…

Current Process Detection Programs PQS-based Programs

Interwoven logic

Model description

Process query system

+

User responsibility System responsibility

User responsibility System responsibility

The COBOL and pre-PQS Analogy

Page 26: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Computer Security Example(V. Berk and N. Fox)

Funded by ARDA and DHS

IPAM 27-7-2005Cybenko

Page 27: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Network Security Objective:Objective:

Detect, disambiguate, and predict the course of Detect, disambiguate, and predict the course of concerted network attacks in an enterprise concerted network attacks in an enterprise class network.class network.

Why:Why:Problem domain Problem domain demandsdemands the power of PQS the power of PQS Hundreds of “processes” occurring at onceHundreds of “processes” occurring at once Lots of missed observations and noiseLots of missed observations and noise All commercial technology focuses on collection and All commercial technology focuses on collection and

presentation of datapresentation of data Existing correlation efforts very weak at bestExisting correlation efforts very weak at best

IPAM 27-7-2005Cybenko

Page 28: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Goal of PQS in network monitoring

Create a system that quickly, and Create a system that quickly, and accurately correlates related activity.accurately correlates related activity.

Assist a security analyst in deciding:Assist a security analyst in deciding:What activity is irrelevant.What activity is irrelevant.What activity needs attention and further What activity needs attention and further

investigation.investigation.

IPAM 27-7-2005Cybenko

Page 29: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

SENSORS INTEGRATED

DIB:s Dartmouth ICMP-T3 Bcc: System

CovChan Timing Covert Channel Detection

Snort Signature Matching IDS

IPtables Linux Netfilter firewall, log based

Samba SMB server - file access reporting

Weblog IIS, Apache, SSL error logs, …

US-agent Userspace host monitoring agent

Tripwire Host filesystem integrity checker

Global

Network

Host

SENSOR DESCRIPTION SCOPE

IPAM 27-7-2005Cybenko

Page 30: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

IPAM 27-7-2005Cybenko

Multistage Process Model

Start/Normal

Scanned

Infected

Data Access

Exfiltration

Potential malicious activity

Potential normal activity

Page 31: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Internet

DMZ

WS

Dartmouth

WinXP/LINUX targets

192.168.24.192/26

WWW Mail

US-Agent

CovChan

IPTables

Snort

DIB:s

SaMBa

PQS

PQS-Net Testbed at Dartmouth

www.pqsnet.net

ISTS

172.18.12.32-38

Attack Hosts:

• Skaion

• Custom Exploits

• Core Impact™

• Normal Traffic

• Covert Channels

• Worms

PQS-Net

Page 32: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

PQS-Net supply chainTier 1 ModelsTier 1 Models Focus on individual host Focus on individual host

statusstatus Report on status changesReport on status changes

Tier 1 Tracker

Tier 2 Tracker

Attack steps Attack sequences and scores

sensor data

Analyst’s front-end

sensors

Tier 2 ModelsTier 2 Models Focus on correlating host Focus on correlating host

activityactivity Report chains of eventsReport chains of events

Tier 1 OutputTier 1 OutputMon Feb 21 20:06:17 2005 000000 131.58.63.160 Mon Feb 21 20:06:17 2005 000000 131.58.63.160

(hostile) recon on 100.10.20.4 SNORT 469 (hostile) recon on 100.10.20.4 SNORT 469 proto: 1proto: 1

Mon Feb 21 20:30:24 2005 000000 138.158.170.45 Mon Feb 21 20:30:24 2005 000000 138.158.170.45 (hostile) attacked 100.10.20.4 ERRORLOG 400 (hostile) attacked 100.10.20.4 ERRORLOG 400 proto: 6 dport: 443proto: 6 dport: 443

Tier 2 OutputTier 2 OutputHypothesis 1Hypothesis 1

Score: 0.8Score: 0.8

Hypothesis 2Hypothesis 2

Score 0.2Score 0.2

A scans BA scans B A scans BA scans B

B scans EB scans E

B attacks EB attacks E

IPAM 27-7-2005Cybenko

Page 33: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Example Scenario

Internet

BC ED

A

Tier1 AlertsTier1 Alerts IndicatorsIndicators

A scans BA scans BSnort:Snort:

02/21-20:06:17.904500 [**] [1:469:1] ICMP PING NMAP [**] [Classification: 02/21-20:06:17.904500 [**] [1:469:1] ICMP PING NMAP [**] [Classification: Attempted Information Leak] [Priority: 2] {ICMP} 131.58.63.160 -> 100.10.20.4Attempted Information Leak] [Priority: 2] {ICMP} 131.58.63.160 -> 100.10.20.4

C attacks B C attacks B (success)(success)

SSL error log (host 100.10.20.4):SSL error log (host 100.10.20.4):

[Mon Feb 21 20:30:24 2005] [error] mod_ssl: SSL handshake failed (server [Mon Feb 21 20:30:24 2005] [error] mod_ssl: SSL handshake failed (server www.osis.gov:443, client 138.185.170.45) (OpenSSL library error follows)www.osis.gov:443, client 138.185.170.45) (OpenSSL library error follows)

[Mon Feb 21 20:30:24 2005] [error] OpenSSL: [Mon Feb 21 20:30:24 2005] [error] OpenSSL: error:1406908F:lib(20):func(105):reason(143)error:1406908F:lib(20):func(105):reason(143)

IPAM 27-7-2005Cybenko

Page 34: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Example Cont’d

B ED

Tier1 AlertsTier1 Alerts IndicatorsIndicators

B scans DB scans D02/21-20:31:17.528602 [**] [1:1807:2] WEB-MISC Chunked-Encoding 02/21-20:31:17.528602 [**] [1:1807:2] WEB-MISC Chunked-Encoding transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] {TCP} 100.10.20.4:34074 -> 100.10.20.169:80{TCP} 100.10.20.4:34074 -> 100.10.20.169:80

B attacks D (fails)B attacks D (fails)100.20.1.169 - - [21/Feb/2005:08:31:22 -0500] "GET /default.idq?100.20.1.169 - - [21/Feb/2005:08:31:22 -0500] "GET /default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 404 1287 "-" "-"AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 404 1287 "-" "-"

B scans EB scans E02/21-20:32:01.622465 [**] [1:1807:2] WEB-MISC Chunked-Encoding 02/21-20:32:01.622465 [**] [1:1807:2] WEB-MISC Chunked-Encoding transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] {TCP} 100.10.20.4:34076 -> 100.10.20.170:80{TCP} 100.10.20.4:34076 -> 100.10.20.170:80

B attacks E B attacks E (succeeds)(succeeds)

100.20.1.170 - - [21/Feb/2005:08:32:06 -0500] "GET /default.idq?100.20.1.170 - - [21/Feb/2005:08:32:06 -0500] "GET /default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 200 1287 "-" "-"AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 200 1287 "-" "-"

IPAM 27-7-2005Cybenko

Page 35: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Fish Tracking (Kinematic Tracking)A. Jordan, W. Chung, V. Crespi

Funded by DARPA and DHS

IPAM 27-7-2005Cybenko

Page 36: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Real time Fish Tracking

Objective:Objective:Track the fish in the fish tankTrack the fish in the fish tank

Why:Why:Very strong example of the power of PQSVery strong example of the power of PQS Fish swim very quickly and erraticallyFish swim very quickly and erratically Lots of missed observationsLots of missed observations Lots of noiseLots of noise Classical Kalman filters don’t work (non-linear Classical Kalman filters don’t work (non-linear

movement and acceleration)movement and acceleration) ““Easier” than getting permission to track people (we Easier” than getting permission to track people (we

mistakenly thought)mistakenly thought)

IPAM 27-7-2005Cybenko

Page 37: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Fish Tracking Details 5 Gallon tank with 2 red Platys 5 Gallon tank with 2 red Platys

named Bubble and Squeaknamed Bubble and Squeak Camera generates a stream of Camera generates a stream of

“centroids”:“centroids”:For each frame a series of (X,Y) pairs is For each frame a series of (X,Y) pairs is

generated.generated.

Model describes the kinematics Model describes the kinematics of a fish:of a fish:

The model evaluates if new (X,Y) pairs The model evaluates if new (X,Y) pairs could belong to the same fish, based could belong to the same fish, based on measured position, momentum, on measured position, momentum, and predicted next position. This and predicted next position. This way, multiple “tracks” are formed. way, multiple “tracks” are formed. One for each object.One for each object.

Model was built in under 3 Model was built in under 3 days!!!days!!!

IPAM 27-7-2005Cybenko

Page 38: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Autonomic Server Monitoring(C. Roblee, V. Berk)

Funded by DHS, ARDA

IPAM 27-7-2005Cybenko

Page 39: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Autonomic Server Monitoring

Objective:Objective:Detect and predict deteriorating service Detect and predict deteriorating service

situationssituationsWhy:Why:

Another strong example of the power of PQSAnother strong example of the power of PQS Software and hardware are buggy and vulnerableSoftware and hardware are buggy and vulnerable Hot market, large profits for Hot market, large profits for “The ONE”“The ONE” application application Very ambiguous observationsVery ambiguous observations Sys-admins also want vacationSys-admins also want vacation

IPAM 27-7-2005Cybenko

Page 40: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

The Environment Hundreds of servers and servicesHundreds of servers and services Various non-intrusive sensors check for:Various non-intrusive sensors check for:

CPU load Memory footprint Process table (forking behavior) Disk I/O Network I/O Service query response times Suspicious network activities (i.e.. Snort)

Models describe the kinematics of failures and Models describe the kinematics of failures and attacks:attacks:The model evaluates load balancing problems, memory leaks, The model evaluates load balancing problems, memory leaks,

suspicious forking behavior (like /bin/sh), service hiccups suspicious forking behavior (like /bin/sh), service hiccups correlated with network attacks…correlated with network attacks…

IPAM 27-7-2005Cybenko

Page 41: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

t0 t1 t2 t3 t4

Server Compromise Model: Server Compromise Model: Generic Attack Scenario Generic Attack Scenario

Observations Response

o1

Snort NIDS sensor output

...Nov 21 20:57:16 [10.0.0.6] snort: [1:613:7]SCAN myscan [Classification: attempted-recon] [Priority: 2]:{TCP} 212.175.64.248-> 10.0.0.24...

1.o1 o2 o3

Current system record for host 10.0.0.24 (10 records): Average memory over previous 10 samples: 251.000Average CPU over previous 10 samples: 0.970| time | mem used | CPU load | num procs | flag |----------------------------------------------------------------------------------| 1101094903 | 251 | 0.970 | 64 | || 1101094911 | 252 | 0.820 | 64 | || 1101094920 | 251 | 0.920 | 64 | || 1101094928 | 251 | 0.930 | 64 | || 1101094937 | 251 | 0.870 | 65 | || 1101094946 | 251 | 0.970 | 65 | || 1101094955 | 251 | 0.820 | 65 | || 1101094964 | 253 | 1.220 | 65 | ! || 1101094973 | 255 | 1.810 | 65 | ! || 1101094982 | 258 | 2.470 | 65 | ! |

Monitored host sensor output (system level)2.PQS Tracker Output

Last Modified: Mon Nov 21 21:01:03 Model Name: server_compromise1Likelihood: 0.9182Target: 10.0.0.24Optimal Response: SIGKILL proc 6992

SIGKILL

3.

IPAM 27-7-2005Cybenko

Page 42: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Experimental Results:

0

10

20

30

40

50

60

70

80

90

100

0 100 200 300 400 500

Time (s)

% S

ys

tem

Me

mo

ry U

se

d

No Tracking Tracking

Successful Requests

System Memory Consumed210,000 requests serviced 380,000 requests serviced

IPAM 27-7-2005Cybenko

Page 43: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Theory

Process Query System frameworks offer a Process Query System frameworks offer a principled approach that enablesprincipled approach that enablesunderstanding how distinguishable models understanding how distinguishable models

(attack and failure) are (attack and failure) are developing a notion of processes that are developing a notion of processes that are

“trackable,” given models and sensing “trackable,” given models and sensing infrastructure (ie a “sampling theory”)infrastructure (ie a “sampling theory”)

IPAM 27-7-2005Cybenko

Page 44: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Hypothesis Growth

time

Individual path isa “track” – ie one process instance

Consistent tracksform a “hypothesis”

A “hypothesis” is a consistent A “hypothesis” is a consistent assignment of events to assignment of events to processes and/or states(ie, processes and/or states(ie, each event assigned to only each event assigned to only one process instance). one process instance).

Given a set of “hypotheses” Given a set of “hypotheses” for an event stream of length for an event stream of length k-1, update the hypotheses to k-1, update the hypotheses to length k to explain the new length k to explain the new event.event.

NP-Complete in general. NP-Complete in general. Need to prune the pool of Need to prune the pool of hypotheses, keeping the most hypotheses, keeping the most suitable.suitable.

IPAM 27-7-2005Cybenko

Page 45: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Models and Hypothesis Growth

Emission for state i = 0/1 vector of sensor reportseg obs(i) = ( 0 , 1 , 1 , 0 , 0 , 1 , 1 )

Observation vector at time t collected by sensors: eg sensors(t) = ( 0 , 1 , 1 , 1 , 1 , 1 , 0 )

“Weak” modelFSM with “emission”vectors

Possible states at time t are determined by: P = { i | Hamming_distance( obs(i) , sensors(t)) <= HD } R = { i | j possible at time t - 1 and i is reachable from j }

P R is the set of possible states at time t

Number of hypotheses at time t recursively computed as above.

U

Theorem: For a fixed value of HD, the worst-case number of hypotheses at time t is either polynomial or exponential in t. (Crespi, Cybenko, Jiang 2004)

IPAM 27-7-2005Cybenko

Page 46: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Longertracking

time

More noise(worse model)

Longertrackingtime

More noise(worse model)

Nice Demo!!

Ouch!!!

IPAM 27-7-2005Cybenko

Page 47: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Longertrackingtime

More noise(worse model)

ExcellentModels

andSensor

Coverage

AcceptableModels

andSensor

Coverage

PoorModels

andSensor

Coverage

IPAM 27-7-2005Cybenko

Page 48: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Basic Idea Behind the Proof

N states

time t time t+1 time t+2 time k

If there are never two distinct paths from any node to itself over any period of observation, there is a simple injective mapping (ie. unique labeling) of the paths into {0, 1, ... , k} x {0, 1, ... , k} x {0, 1, ... , k} ... x {0, 1, ... , k} 2N times. So the number of paths is < (k+1)2N. The label for each path is the time it first occupies a state and the time it last occupies that state.

IPAM 27-7-2005Cybenko

Page 49: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Basic Idea Behind the Proof

N states

time t time t+1 time t+2 time k

Process dynamics (ie what is reachable from each state in a time step) + observations + noise threshold determines a “trellis”. If there are two distinct paths from one node to itself over some period of time, the number of distinct paths grows exponentially by repeating the construct.

IPAM 27-7-2005Cybenko

Page 50: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

IPAM 27-7-2005Cybenko

Relationship to Spectral Radius

Classical spectral radius: Classical spectral radius: (A) = |(A) = |maxmax|| Joint spectral radius of a set, Joint spectral radius of a set, = {A = {A11, ... A, ... Ann}, of }, of

matrices:matrices:

(() = lim max ) = lim max B Bkk))1/ t1/ t

Hypothesis growth is polynomial iff Hypothesis growth is polynomial iff (() ) <= 1<= 1 Deciding whether Deciding whether (() ) <= 1 for real or rational <= 1 for real or rational

matrices is impossible (Tsitsiklis and Blondel, 2000)matrices is impossible (Tsitsiklis and Blondel, 2000) If If consists of 0-1 matrices, decidable but NP hard.consists of 0-1 matrices, decidable but NP hard.

t Bk 0 < k < t+1

Page 51: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Distinguishability of models

Given two “models”, how distinguishable are Given two “models”, how distinguishable are they?they?

Example: How different are these two models?Example: How different are these two models?

1 2

p(a|1) = 0.8 , p(c|1) = 0.2 p(b|2) = 1 p(a|3) = 0.8, p(c|3) = 0.2

31 0.8

0.5

0.2 0.5

1 2

p(a|1) = 0.9 , p(d|1) = 0.1 p(b|2) = 1 p(a|3) = 0.8, p(c|3) = 0.2

30.9 0.8

0.5

0.2 0.5

0.1

IPAM 27-7-2005Cybenko

Page 52: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Distinguishability of models

The goal is to answer questions such as: “Do we The goal is to answer questions such as: “Do we need to build more refined models or do we need to build more refined models or do we need to add additional sensors/data sources or need to add additional sensors/data sources or improve tracking/hypothesis management?”improve tracking/hypothesis management?”

Distance between

means

IPAM 27-7-2005Cybenko

Page 53: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Different degrees of distinguishability betweenmodels given their sensing capabilities: 1

Red: Prob of deciding model 2 given model 1Blue: Prob of deciding model 1 given model 2

Entropy of the two ergodic models are different.

Decision rule is based on ML as determined by the Viterbi algorithm

Shannon-MacMillan-Brieman Ergodic Theoremstates that “most” observation sequencesare “typical” and have probability related to the entropy

IPAM 27-7-2005Cybenko

Page 54: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Different degrees of distinguishability betweenmodels given their sensing capabilities: 2

However, nonmonotonic behaviors are possible(in general) and without convergence to zero (if the entropies are the same)

IPAM 27-7-2005Cybenko

Page 55: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Different degrees of distinguishability betweenmodels given their sensing capabilities: 3

However, nonmonotonic behaviors are possible(in general) and without convergence to zero (if the entropies are the same)

IPAM 27-7-2005Cybenko

Page 56: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

One state sequence, one observation seq.

One observation seq., at most 1 state seq.

If acceptable, there is 1 state seq.

If unacceptable, there is 0 state seq.

A WM can be reduced to a DFA.

Every DFA has an unique minimum state unifilar WM:

WM->DFA->Minimization->WM

For a unifilar WM, counting acceptable strings with length n, for n sufficiently large:

Where λ1 is the maximum eigenvalue of A .

Y. Sheng thesis, efficient estimates of1

Unifilar models

T1{0}

T2{1}

T3{1}

011

100

011

A

10

10

01

B

0110

11101

10

nnn AML

Definition: for any pair of state si, and input yj, there could be at most one successor state

IPAM 27-7-2005Cybenko

Page 57: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Summary

Multiple process detection is a ubiquitous problem Multiple process detection is a ubiquitous problem with many applications but it has not been with many applications but it has not been systematically studied.systematically studied.

Existing approaches are either very ad hoc, very Existing approaches are either very ad hoc, very specialized or very unscalable.specialized or very unscalable.

There is a promising generic software system for There is a promising generic software system for solving multiple process detection.solving multiple process detection.

The theory is rich and largely unexplored.The theory is rich and largely unexplored.

IPAM 27-7-2005Cybenko

Page 58: The Mathematics and Algorithmics of Process Detection George Cybenko Dartmouth College Hanover NH 03755 USA gvc@dartmouth.edu IPAM 27-7-2005 Cybenko

Questions

See www.pqsnet.net for papers.

IPAM 27-7-2005Cybenko