hierarchical adaptive control of qos for intrusion tolerance james e. just james c. reynolds karl...

Hierarchical Adaptive Control of QoS for Intrusion Tolerance

James E. JustJames C. Reynolds

Karl Levitt20 August 2002

HACQIT

Outline• Overview• Technical Approach• How Does It Work• What’s New• Testing• Transition• Plans• Karl Levitt, UC Davis -- Attack Diagnosis

The HACQIT Idea

• Deliver ongoing critical application services to selected users while under network-based cyber-attack

• Specific project goals– Understand how to and demonstrate delivery of critical user

services for 4 hours while under active attack with no more than 25% degradation in user performance

– Handle unknown attacks that can exhaust spare resources

– Ignore denial-of-service attacks on bandwidth and users

– Force massive increase in adversary work factor for attacks

– Support broad classes of COTS HW & SW for near term military utility without footprint of Byzantine fault tolerance

– Provide extensible (architecture-based) intrusion tolerance framework for longer term utility

– Understand the design space of intrusion tolerant systems for real world use with COTS/GOTS hardware and software

Formal System Boundaries & Assumptions

Key

= Non-Critical Service

Server

= Attacker

User = Non-Critical User

= Critical ServiceServer

= Critical UserUser

Key Assumptions:• LAN is reliable, cannot be flooded• No direct DoS attacks against critical users• HACQIT cluster HW & SW are pristine at

startup & attackers have no physical access• Unknown vulnerabilities exist in cluster• Critical users & administrators are trusted• Users interact with services via LAN & hosts

Goal: Provide critical services to selected users while under attack with <25% degradation in performance

Requirements Imposed on Portion

of System Environment

HACQIT Cluster running Critical

Application Services

LAN

User N

q

User A

User K

Server r

o

HACQIT System

System Environment

Other clusters

Intrusion resistant architecture with strong separation boundaries

Process pair (hot spare) redundancy and failover with diversity if possible

Behavior specification approach to recognizing errors and intrusions – defend in depth

Multiple response types– Failover– Randomly rejuvenate– Block calls or attacker– Filter out bad requests (e.g., known attacks)– Perform failure diagnosis

Technical Approach

Identify, learn and block new attacks

Generalize new attack blocking filters

Illustration of HACQIT Response to Attacks

Attacker compromises critical user host

Performance impact: less than 4% in simple tests

With Generalization enabled, Code Red 1, 2 and all variants are blocked after first Code Red 1 attack

HACQIT Capabilities• Current - Protects web enabled message board (dynamic state)

– Leverages critical application diversity but does not require it– Adapts cluster sensor settings and responses through dynamic policy changes– Detects faults / intrusions by observed deviations from behavior specification– Maintains full event logs and N-minute service request buffer– Fails over as required in response to attacks or faults– Restores state and resyncs a new process pair with spare– Performs continuous recovery to clean up compromised machine – Rejuvenates application software at random or fixed intervals as preventive action– Learns/blocks unknown attacks using forensics analysis of captured traffic to reproduce

symptoms in Sandbox– Generalizes some rules to block simple variants of attack– Outputs SNORT rule for broader detection of new attack– Builds allowed process list in semi-automated mode

• New work under exercised option– Diagnosis and recovery: Learning and generalizing additional attacks – enhanced

forensics– Survivable server demonstration

• Possible– Change security posture as result of external alerts– Engage in group learning (attack notification, filter settings, etc) with other clusters,

Cyber Panel, firewall, others

Recent Accomplishments• Ability to handle non-diverse applications (time-diversity)• Vulnerability simulator to enable more robust attacks and

learning behavior• Continuous recovery against unauthorized files and processes• SNORT rules generated for previously unknown attack• More informative GUI implemented• Modified critical application – web enabled message board

using SQL Server backend database with active server pages (more realistic dynamic data) on both IIS and Apache

• More robust implementations of cluster, host, and application monitors and controllers

• Improved performance – some rearchitecting• Policy database for storage & incremental distribution• Additional learning

New HACQIT GUI

Penetration Testing• Background

– UC Davis Computer Security Class (under Prof. Matt Bishop) is learning Penetration Testing/Fault Hypothesis Methodology on HACQIT cluster (Jan-Mar 2002)

– Complete HACQIT documentation provided – including our source code, versions of all COTS OS & products

– Described known vulnerabilities

• Results– No successful attacks, i.e., no attacks caused failover

– Great system test vehicle – found and fixed lots of bugs

– Some effort continuing over summer with student “winner”

Performance Test Results• Simulated background workload and successful attacks using

– Microsoft web stress tool used to generate workload by posting new messages to message board every 2-3 seconds

– Successful “attacks” every 5-7 minutes, i.e., caused failover– Lots of statistics collected– Selected highlights below

Item Baseline Extensive Failover

Total Hits / Socket Errors / Correct Hits

(1) 1495 / 0 / 1495

(2) 1558 / 23 (C) / 1558

1472 / 6 (T) / 1084

Sent / Received

(KB )

(1) 704 / 1382

(2) 734 / 1440

696 / 1174

Response (TTFB)

(Aver., Min., Max)

(1) 22 / 17 / 677

(2) 69 / 66 / 484

63 / 4 / 12021

72.5% 69.6%

98.9% 94.8%

286% 91%

Other Validation Efforts Ongoing

• Validation peer review• Continued software analysis and penetration testing• Internet exposure• Own use of hacker tools, e.g., whisker• Use of new vulnerability simulator and script injector

-- ongoing• Analysis -- ongoing

– New attack completion time, number of new attacks, recovery time from successful attack, time to learn and block a new attack, number of spares (& diversity), time to identify and stop attacker

Transition

• Patent application/investigation ongoing• Communications with selected protection vendors• Examining open source or tool kit distribution• Other agencies/Services

– DISA JPO discussions

– CECOM Survivable Server

Plans

• Focus on more extensive learning and generalization capabilities for single and two stage attacks– Extensive use of vulnerability simulator– Vulnerability and attack categorization

• Validation– Update validation framework– Participate in validation peer review– Continue open network access and Davis reviews

• Incorporation of UC Davis results• Integration with other OASIS technologies• More transition effort

Demo tonight – y’all come

HACQIT – A Beginning for “Systems That Know”• Self awareness -- distinguish between self / non-self

– Behavior specification – defines allowable or normal behavior, not statistical– Attack definition – repeatable failure caused by user / agent service request– Host monitor -- hierarchy of reference monitors that evaluate input content,

key dll/system calls, critical application “behavior”, behavior of other allowed processes, network traffic behavior, host behavior, etc. against specifications – defense in depth

– Cluster monitor -- failover and recovery decisions based on QoS monitoring, integrity and liveness tests, and health status reports, random timing, etc

• Recovery mechanisms (reflex and deliberative actions)– Start/kill processes, restore/delete files, mediate application calls, isolate at

propagation boundaries , restart OS– Continuous recovery, random rejuvenation, checkpoint and restore,

reconstitute process pair with new servers, etc• Adapt to unknown attacks (beginnings of reflective actions)

– Learn -- prevent attacker from exhausting backups • Black-box recorder for input service request (implemented as N-minute circular

buffer) and sensor reports• Sandbox (isolated duplicate of critical servers)• Diagnostic analyzer (i.e., was failure caused by repeatable attack and, if so, which

service request(s) caused it)• Adaptive content filter

– Generalize -- refine blocking signature for some attacks types -- shorten, identify initiating event, generalize, etc -- can be done remotely

UC Davis Contribution to HACQIT Forensics

PI: Karl Levitt

Adults: Jeff Rowe

Jim Alves-Foss (visiting from Idaho)

Mark Heckman (now at Promia Corp)

Grad Students: Melissa Danforth, Nicole Carlson, Tye Stallard, Marcus Tylutki

Undergrad Student: Barry Allard

Reminder: Unique Features of HACQIT

• Attack Assumptions: unknown, strike again as variant

• Detection through violation of QoS specifications and specification-based intrusion detection – Future: Specifications on confidentiality, functional behavior of applications

• Analysis: Logs and Sandbox provide data for decision tree

• Filters: Block packets and system calls in server– Future: Block calls in client

• Rollover: hot process-pair, processor– Future: Other resources-- processes, files, disk blocks, ports, IP addresses;

randomization to achieve diversity

Attack Management in HACQIT• When an attack occurs do begin

– Detect the event causing a trigger;

– Determine if event is really an attack;

– If attack do begin until “successful blocking• Determine minimal switchover that effects recovery from attack;

• Determine event sequences that caused attacks;

• Identify most likely event sequence;

• Determine signals to blocking effectors that cause minimal mission impact;

• Generalize blocking signals to address generalizations of attack

Forensic Process Flow

Trigger

Policy/spec violation Analysis

Signature generalization

Root cause forensics

Response Options

Data Collection

More logging

Alternate HypothesisMission Model

Outline

• Specification-Based Intrusion Detection: The best approach to detect unknown attacks or variants of known attacks.

• Forensics Principles: Based on how a human forensics expert analyzes logs to identify attacks.

• Forensics Analysis: Use decision tree to identify possible event sequences that could have caused triggered specification violation.

• Attack Blocking: Perform immediate and safe blocking, that is subsequently refined to be close to optimal.

• Forensic Analysis of Code Red(s):

• HMAP: A tool to remotely test web servers

• Mission Model: Necessary to determine optimal response to attacks

• Completeness Verification of Specification-Based Intrusion Detection: Verify specifications of Unix privileged programs with respect to an accepted protection model.

• Related Work (DARPA, other) in Forensics:

Outline

• Specification-Based Intrusion Detection: The best approach to detect unknown attacks or variants of known attacks.

• Suggests a fast and safe response based on those constraints that are violated

Approaches to Integrity Attack Detection

• Static: Detect an inconsistency in system state– Tripwire: Inconsistency in a file

– Diagnosis: Test a component

• Dynamic:1. Misuse: Detect known attacks through their signatures

2. Anomaly: Detect activity that does not match a profile; can profile users, processes, programs, systems, networks,…

3. Specification-Based: Detect activity that is inconsistent with a priori specification (aka constraints) for an object. Can write specifications for: programs; protocols; policies on users, …

4. Hybrid (2 and 3): a priori template specifications with parameters discovered by profiling

Only Static and Dynamic (2,4) can detect unknown attacks

Useful Types of Constraints• Policy on Users

– Files a user can access

– Resources a user is allowed to possess

• Protocol Specifications -- operational view– Defines allowable transitions

– Defines allowable time in a given state

• Protocol Specifications -- message content– Mappings delivered by DNS should accurately

represent view of authoritative router

– IP addresses are not spoofed

Useful Types of Constraints • Protocols -- Invariant and assumptions

– IP Routers approximate Kirchoff’s law

– Packets are not sniffed by third-party

– Packet source must be a non-congested/non-DOSed host

• Programs -- valid access constraints– Programs access only certain objects

• Programs - Interaction constraints– program interaction should not change the semantic

• Data Integrity– e.g., passwords, other authentication information

– authorization information, process table

Constraints

User constraints

Data constraints

Application constraints

Protocol constraints

Program constraints

Access constraints

Interaction constraints

Operational constraints

Message constraints

Protocol Invariants

Access Constraints for Programs

• Can Detect– remote users gain local accesses

– local users gain additional privileges

– Trojan Horses

• Work well for many programs, e.g., passwd, lpr, lprm, lpq, fingerd, at, atq, …

• Some program can potentially access many files, e.g., httpd, ftpd – break the execution into pieces (or threadlets). Define

the valid access for each sub-thread.

– Threadlet defined by transition operations

An ARP specification

i reply_wait cached

ARP Request ARP Response

ARP cache timeout

ARP Request

Monitoring for Intrusions

i reply_wait cached

ARP Request ARP Response

ARP cache timeout

alarmUnsolicitied ARP Response

Bogus ARP Response

Malformed Request ARP Request

Other Protocols Specified for Intrusion Detection

• Domain Name System (DNS)

• Network File System (NFS)

• Distributed Host Configuration Protocol (DHCP)

• TCP

• FTP

• RIP routing protocol

• OSPF routing protocol

Difficult Unknown Attacks

• Loss of Confidentiality: Need to detect “exfiltration”

• Causes change to application functionality: Need to write specification for application behavior

• Insider browsing in unexpected locations: Anomaly detection, or detect activity inconsistent with a policy -- “Demids”: An intrusion detection system for databases

Outline

• Forensics Principles: Based on how a human forensics expert analyzes logs to identify attacks.

Prevention Forensics• Protecting against future attack instances requires

determination of the cause.• Forensics need only proceed until a response blocking future

attacks is determined, not to the ultimate root cause.• The HACQIT Forensic Agent automatically detects the

suspicious events in the application and network traffic logs.• Reports from the Forensic Agent are used by the Response

Agent to block future instances.

Automated Forensic MethodologyRaw Data

dbtpto tty03 SVRC05 Thu Feb 21 12: 48 - 12: 52 (00: 03)tgtawb tty02 SVRC05 Thu Feb 21 12: 44 still logged inLast login: Wed Jul 28 19: 59: 56 1999 from beukel. PorcupineJul 30 99 18:45:453743 .a. -rw- r-- r-- root wheel /etc/ make. Conf

4347 .a. -r-- r-- r-- bin bin /usr/ include/ machine/ ansi. h3911 .a. -r-- r-- r-- bin bin /usr/ include/ machine/ endian. h2697 .a. -r-- r-- r-- bin bin /usr/ include/ machine/ types. h5903 .a. -r-- r-- r-- bin bin /usr/ include/ sys/ types. h3528 .a. -r-- r-- r-- bin bin /usr/ share/ mk/ bsd. own. mk3945 .a. -r-- r-- r-- bin bin /usr/ share/ mk/ sys. mk

Jul 30 99 18:45:46 1949 .a. -r-- r-- r-- bin bin /usr/ lib/ crt0. o 22544 .a. -r-- r-- r-- bin bin /usr/ lib/ libgcc. A

May 20 01: 04: 42 tuegate: 14498 systatd: connect from litp.ibp.FrMay 20 01: 10: 19 tuegate: 14536 systatd: connect from monk.rutgers.eduMay 20 01: 23: 49 tuegate: 15040 systatd: connect from monk.rutgers.edu

Effect 1

Effect 2

Effect 3

Effect 4 Effect 4

Forensic Evidence Analysis

Cause ACause B

Cause C

Cau

se D

Cau

se E

Cause F

Abstract Forensic Leads

1. Login Record 3:22 12/11/2001

2. Code Compiled 3:27 12/11/2001

3. Network Connection 4:02 12/11/2001

4. Inconsistent File System 4:11 12/11/2001

5. IDS Alarm 4:12 12/11/2001

6. Login Record 4:12 12/11/2001

7. Suspicious Transaction in Application Log 4:13 12/11/2001

Outline

• Forensics Analysis: To identify possible event sequences that could have caused triggered specification violation.

• Determine blocking rules that are:– Immediate– Mission aware– Reversible– Subsequently optimized

• Related Work (Darpa, other) in Forensics:– DERBI: SRI– Maita: MIT

Forensic Analysis: Overview

• Create a tree whose – Roots are initiating events for sequences of actions

– Leaves are the terminating events of sequences of actions, usually events that are triggered by specification violations

• Upon notice of a trigger1. Identify trigger in the forensics tree

2. Identify predecessor actions of the trigger

3. Identify all predecessors that are matched with events in log (circular buffer)

4. For all predecessors identified in (3), repeat starting with (1)

Connection Spoofing

Cathy

Bob

Eve

Alice

5. Forge SYN-ACK packet toestablish the connection as trusted client

trigger

suspiciousrsh login entry

in log file

Trojanedrsh daemon

nopassword

““+ +”+ +”.rhosts.rhosts

NetworkIntrusionDetectionSystem

Host A

buffer overflow

stolenpassword

via password cracking

spoofedspoofedconnectionconnectionbetweenbetween

A & BA & B

Host B

denialdenialofof

serviceservice

activeactiveconnectionconnection

between A & Bbetween A & B

unencyptedunencyptednetworknetwork

connectionsconnections

trigger


in log file

Trojanedrsh daemon

nopassword



Host A

buffer overflow

stolenpassword



A & BA & B

Host B

denialdenialofof

serviceservice





STOP

trigger


in log file

Trojanedrsh daemon

nopassword



Host A

buffer overflow

stolenpassword



A & BA & B

Host B

denialdenialofof

serviceservice





STOPSTOP

trigger


in log file

Trojanedrsh daemon

nopassword



Host A

buffer overflow

stolenpassword



A & BA & B

Host B

denialdenialofof

serviceservice





STOPSTOP

STOP

Forensics Consistency Checks• utmpx, wtmpx, utmp, wtmp, lastlog

– Note logins without logouts (start-end mismatches)

– Note inconsistencies in tty usage

– Note currently unknown users

– Note remote logins from a new host for that user

– Note failed logins

– Verify time of log messages monotonic

– Search for gaps in log entries

Forensics Consistency Checks (cont.)• Compare system output with trusted copy of ps, ls,

du, ifconfig and netstat • Compare MD5 checksums of suspicious copies with

known values • Confirm unauthorized system/root access • Check suspicious files by name • Check for hidden files and processes • Check for replaced system commands

Decision Tree

Outline (cont.)

• HMAP: A tool to remotely test web servers

“Features” of the HTTP Protocol• The RFC specifies that HTTP responses shall include the type

and version of the server software.• The RFC specifies how servers respond to various exceptions• No commonly used web server implementations strictly

adhere to the RFC.• Large variations in web server’s handling of errors allows

them to be accurately identified even if they refuse do identify themselves in the HTTP header.

Warning in RFC 2068

“Note: Revealing the specific software version of the server might allow the server machine to become more vulnerable to attacks against software that is known to contain security holes. Server implementers are encouraged to make this field a configurable option”

Fingerprinting Web Servers

• Error Messages from pathologically long URL’sServer URL Length Response Apache/1.3.12 (Win) 1-216 404 Not Found

217-8176 403 Forbidden8177-up 414 Request-URI Too Large

Netscape-FastTrack/4.1 1-4089 404 Not found4090-8123 500 Server Error8124-8176 413 Request Entity Too Large8177-up 400 Bad request

• Wide variation in responses to ad hoc requests[GET / HTTP/][GET/HTTP/1.0][HEAD /.\ HTTP/1.0][HEAD /asdfasdfasdfasdfasdf/../ HTTP/1.0][HEAD /asdfasdfasdfasdfasdf/.. HTTP/1.0][HEAD /./././././././././././././././ HTTP/1.0][HEAD / HTTP/1.0][HEAD ///////////// HTTP/1.0]

Structuring Fingerprint Characteristics

• Lexical: The specific words, phrases and punctuation that are used in response

• Syntactic: The ordering and context of words, phrases, headers and other elements

• Semantic: The server’s specific interpretation of a request from among the possible interpretations

Specifying the HTTP Protocol• In addition to low level networking protocols, we

want a specification for high level application protocols.

• HTTP is in many ways simpler than DHCP or ARP.– Simple request/response behavior

– Limited state saved between transactions

Backup

HACQIT Reference Architecture

WAN

Out-of-Band Comms Between HACQIT

Clusters & Cyber Panel

Other Enclaves

LAN

User 1

FW

User 2

User N

qUser i

Server pServer q

Enclave

Server r

= Non-Critical Service

Server

= Attacker

User = Critical User

User = Non-Critical User

= Sensors Key

= Critical Service

= Controller

Server

User KUser KUser KUser P

User J

User P

User J

All critical user interaction with

HACQIT protected critical applications is via

VPN

i

HACQIT Cluster

HACQIT Cluster HW Design

GWSwitch

FW

Monitor &

Adapter

Communications with other

Monitor/Adapters and Cyber Panel

Monitor-adapter uses Out-of-Band signaling for complete separation from

network attacks on LAN and WAN

To Critical UsersOut-of-Band

Control Pathways

VPN

Sensors

Controls

Primary

Backup

Spare

To Enclave Firewall & Sensors

Sandbox

Only the current Primary is accessible to users

•Eliminate many attacks and channel others to go through application

•Strong separation boundaries to limit attack propagation

Simplified Software ImplementationUsers

via LAN & WAN

Sandbox

PrimaryDuplicate

BackupDuplicate

Offline Backup(Spare/Fishbowl)

Other Cluster Controllers

Primary

Connection ManagerProtection Wrapper

IIS

ApplicationMonitor

HostMonitor

File Integrity

Network Monitor

FW/GW

ControllerIP Switch

Backup

Connection Manager

Protection Wrapper

Apache

ApplicationMonitor

HostMonitor

Out-of-Band Communication MediatorOut-of-Band Controller

MACPolicyEditor

PolicyServer

Buffer &Log

OperatorDisplay Diagnostics

Analyzer

ContentFilter

CircularBuffer

Generalizer

VPNFile

Integrity

Network Monitor

INTERNET

DSL

Linux Login Server

MiniDMZ

Project Web Server

CU SnifferW2K CUW2K CU

Enclave Sniffer

OOB Switch IB Switch

Critical Services (CS)

Critical Users (CU)

MACDC

DNS

W2K IIS CS W2K IIS CS W2K Apache CSSandbox

Enclave Firewall

ClusterFirewall

IB Sniffer

Cluster Message Board

OOB Controller

UC DavisComputer

Security Class

HACQIT Penetration Testing Cluster

X

Penetration Testing (cont.)

• Precautions:– Isolated HACQIT.net – only Davis inbound, no outbound

– Students use Davis CS Dept. computers to connect via SSH to login server at Teknowledge

– Class ground rules with stiff penalties – no problems

• Weakened protections over time to help attackers– Probes/attacks launched from login server as trusted user

– Weakened firewall rules, unpatched Win2K OS and IIS web server

– Turned off attacker blocking

• Fully instrumented cluster to capture attacks– Extensive logging at multiple levels

– Sniffers

– Daily dumps

hierarchical adaptive control of qos for intrusion tolerance james e. just james c. reynolds karl...

Documents

critical services

service attacks

additional attacks

active attack

new process pair

floodedno direct dos

intrusion tolerancejames

application software