the core thesis
Post on 05-Jan-2016
35 Views
Preview:
DESCRIPTION
TRANSCRIPT
Automatic Trust Managementfor
Adaptive Survivable Systems(ATM for ASS’s)
Howard Shrobe MIT AI LabJon Doyle MIT Lab for Computer Science
The Core Thesis
Survivable systems make careful judgments about
the trustworthiness of their computational environment
and they make rational resource allocation decisions
based on their assessment of trustworthiness.
The Thesis In Detail: Trust Model
• It is crucial to estimate to what degree and for what purposes a computational resource may be trusted.
• This influences decisions about:– What tasks should be assigned to which resources.– What contingencies should be provided for,– How much effort to spend watching over the resources.
• The trust estimate depends on having a model of the possible ways in which a computational resource may be compromised.
The Thesis in Detail: Adaptive Survivable Systems
• The application itself must be capable of self-monitoring and diagnosis – It must know the purposes of its components– It must check that these are achieved– If these purposes are not achieved, it must localize and characterize the
failure
• The application itself must be capable of adaptation so that it can best achieve its purposes within the available infrastructure.– It must have more than one way to effect each critical computation– It should choose an alternative approach if the first one failed– It should make its initial choices in light of the trust model
The Thesis in Detail: Rational Resource Allocation
• This depends on the ability of the application, monitoring, and control systems to engage in rational decision making about what resources they should use to achieve the best balance of expected benefit to risk.
• The amount of resources dedicated to monitoring should vary with the threat level
• The methods used to achieve computational goals and the location of the computations should vary with the threat
• Somewhat compromised systems will sometimes have to be used to achieve a goal
• Sometimes doing nothing will be the best choice
The Active Trust Management Architecture
Self Adaptive Survivable Systems
PerpetualAnalytical Monitoring
Trust Model:TrustworthinessCompromises
Attacks
Rational Decision Making
Other InformationSources:
Intrusion Detectors
TrendTemplates System Models
&Domain Architecture
Rational Resource Allocation
Tiers of a Trust Model
• Attack Level: history of “bad” behaviors – penetration, denial of service, unusual access, Flooding
• Compromise Level: state of mechanisms that provide:– Privacy: stolen passwords, stolen data, packet snooping– Integrity: parasitized, changed data, changed code– Authentication: changed keys, stolen keys– Non-repudiation: compromised keys, compromised algorithms– QoS: slow execution– Command and Control Properties: compromises to the monitoring
infrastructure
• Trust Level: degree of confidence in key properties– Compromise states– Intent of attackers– Political situation
Adaptive Survivable Systems
Super routines
Layer1
Layer2
Layer3
Post Condition 1 of FooBecause Post Cond 2 of B
And Post Cond 1 of C
PreReq 1 of BBecause Post Cond 1 of A
A
B
C
Foo
Synthesized Sentinels
Development Environment Runtime Environment
DiagnosticService
Repair PlanSelector
ResourceAllocator
alerts
AB
Condition-1Condition-1
SelfMonitoring
RollbackDesigner
Enactment
Plan Structures
Component Asset Base
1 2 3
Foo
1 2 3
B1 2 3
A
Method 3Is most Attractive
1 2 3
To: Execute Foo
Rational Selection
Diagnosis &Recovery
Context for the Project: The Intelligent Room (E21)
• The Intelligent Room is an Integrated Environment for Multi-modal HCI. It has Eyes and Ears.– The room provides speech input– The room has deep understanding of natural language utterances– The room has a variety of machine vision systems that enable it to:
• Track motion and maintain the position of people• Recognize gestures• Recognize body postures• Identify faces (eventually)• Track pointing devices (e.g. laser pointer)• Select Optimal Camera for Remote Viewers• Steer Cameras to track focus of attention
• MetaGlue is a lightweight, distributed agent infrastructure for integrating and dynamically (un)connecting new HCI components. Meta Glue is the Brains of the Room.
The E21 Maps Abstract Services into Plans
• Users request abstract services from the E21– “I want to get a textual message to a system wizard”
• The E21 has many plans for how to render each abstract service– “Locate a wizard, project on a wall near her”– “Locate a wizard, use a voice synthesizer and a speaker near her”– “Print the message and page the Wizards to go to the printer”
• Each plan requires certain resources (and other abstract services)– Some resources are more valuable than others (higher cost)– Some resources are more useful for this plan than others (higher
benefit)– The resources may be otherwise committed
• They may be preempted (but at a high cost)
• The resource manager picks a set of resources which is (nearly) optimal
AbstractService
ServiceControl Parameters
User’sUtility
Function
The binding of parameters has a value to the user
Resource1,1
Resource1,2
Resource1,j
Each Method Requires Different Resources
The System Selects the Method Which Maximizes Net Benefit
User Requests A
Service with certain parameters
ResourceCost
Function
The ResourcesUsed by the MethodHave a cost
Net Benefit
Each Method Binds the Settings of The Control Parameters in a Different Way
Method1
Method2
Methodn
Each Service can beProvided by Several
Methods
Recovering From Failures
• The E21 renders services by translating them into plans involving physical resources– Physical resources have know failure modes
• Each plan step accomplishes sub-goal conditions needed by succeeding steps– Each condition has some way of monitoring whether it has been
accomplished– These monitoring steps are also inserted into the plan
• If a sub-goal fails to be accomplished, model-based diagnosis isolates and characterizes the failure
• A recovery is chosen based on the diagnosis– It might be as simple as “try it again”, we had a network glitch– It might be “try it again, but with a different selection of resources”– It might be as complex as “clean up and try a different plan”
AbstractService
ServiceControl Parameters
User’sUtility
Function
The binding of parameters has a value to the user
Resource1,1
Resource1,2
Resource1,j
Each Method Requires Different Resources
Access Policies Naturally Fit Within the Model
User Requests A
Service with certain parameters
ResourceCost
Function
The ResourcesUsed by the MethodHave a cost
Net Benefit
Each Method Binds the Settings of The Control Parameters in a Different Way
Method1
Method2
Methodn
Each Service can beProvided by Several
Methods
AccessPolicies
Model Based Troubleshooting
For Trust Model Updating
Model Based Diagnosis for Survivable Systems
• Extension of previous work on Model-based Diagnosis– Shrobe & Davis, Williams and deKleer– Previous work dealt with hardware failures– Previous work ignored common-mode failures
• Focus is on diagnosing failure of Computations in order to assess the health of the underlying resources
• Given: – Plan Structure of the Computation describing expected behavior
including QoS– Observation of actual behavior that deviates from expectations
• Produce:– Localization: which component(s) failed– Characterization: what did they do wrong– Inferences about the compromise state of the computational
resources involved.– Inferences about what attacks enabled the compromise to occur
Ontology of the Diagnostic Task
• Computations utilize a set of resources (e.g. the computation uses hosts, binary executable file, databases etc.)
• Individual resources have vulnerabilities• Vulnerabilities enable attacks• An attack on a instance of a particular type of resource can
cause that resource to enter a compromised state• A computation that utilizes a compromised resource may
exhibit a misbehavior, I.e. it may behave in a manner other than would be predicted by its design.
• Misbehaviors are the symptoms which initiate diagnostic activity, leading to updated assessments of:– The compromised states of the resources used in the computation– The likelihood of attacks having succeeded– The likelihood that other resources have been compromised
The Space of Intrusion Detection
StatisticalProfile
StructuralModel/Pattern
Match to Bad
Discrepancy from GoodAnomaly
Suspicious Violation
Symptom
Model of Expected Behavior.UNSUPERVISED LEARNING FROM NORMAL RUNS
SUPERVISED LEARNING FROM ATTACK RUNS
HANDCODED STRUCTURAL MODELS OF ATTACKS
A symptom may indicate an attack or a compromise
Model Based TroubleshootingConstraint Suspension
Times
Times
Times
Plus
Plus
3
5
3
5
5
15
15
25
40
40
35
40
Consistent Diagnosis:Broken takes inputs 25 and 15Produces Output 35
10
Consistent Diagnosis:Broken takes inputs 5 and 3Produces Output 10No Consistent Diagnosis:
Conflict between 25 & 20
20
25
Multiple Faults and theGeneral Diagnostic Engine (GDE)
• Each component is modeled by multi-directional constraints representing the normal behavior
• As a value is propagated through a component model, it is labeled with the assumption that this component works– The propagated label is the set union of the labels of the inputs to the
model plus a token for the current model• A conflict is detected at any place to which inconsistent values
are propagated– It’s inconsistent to believe two inconsistent values at once– The union of the labels of these values imply that you should believe
both– At least one element in this union must be false.– A Nogood is the set union of the labels of the conflicting values.
• A diagnosis is a set of assumptions which form a covering set of all Nogoods (i.e. includes at least 1 assumption in each nogood)
• Goal is to find all minimum diagnoses
Model Based TroubleshootingGDE
Times
Times
Times
Plus
Plus
3
5
3
5
5
40
40
35
40
Conflicts:
Diagnoses:
25
20
Blue or Violet Broken
Green Broken, Red with compensating fault
Green Broken, Yellow with masking fault
15
15
25
Applying MBT to QoS Issues
Time:0Component1
Delay:1,3
Component2
Component3
Component4
Component5
Delay:2,4
Delay:3,4
Delay:1,2
Delay:5,10Time:1,3
Time:3,7
Time:9,17
Time:5,9
Observed Time:27
Observed Time:6
Time:4,7
Time:4,5
Time:9,17Time:1,1
Time:3,5
Time:9,15
Blue broken Violet brokenRed broken,Yellow brokenRed broken, Green brokenGreen broken, Yellow broken
Broken How?!
Conflicts:
Diagnoses:
Component2
Delay:2,4
Normal: Delay: 2, 4 Probability 90%
Delayed: Delay 4, +inf Probability 9%
Accelerated: Delay -inf,4 Probability 1%
Adding Failure Models• In addition to modeling the normal behavior of each
component, we can provide models of known abnormal behavior
• Each Model can have an associated probability• A “leak Model” covering unknown failures/compromises
covers residual probabilities.• Diagnostic task becomes, finding most likely set(s) of
models (one model for each component) consistent with the observations.
• Search process is best first search with joint probability as the metric
Consistent DiagnosesA B C MID MID Prob Explanation
Low HighNormal Normal Slow 3 3 .04410 C is delayedSlow Fast Normal 7 12 .00640 A Slow, B Masks runs negative!Fast Normal Slow 1 2 .00630 A Fast, C SlowerNormal Fast Slow 4 6 .00196 B not too fast, C slowFast Slow Slow -30 0 .00042 A Fast, B Masks, C slowSlow Fast Fast 13 30 .00024 A Slow, B Masks, C not masking fast
L H PNormal:3 6 .7Fast: -30 2 .1Slow: 7 30 .2
IN 0L H P
Normal:5 10 0.8Fast: -30 4 .03Slow: 11 30 .07
OUT2
Observed: 17Predicted: Low = 8
High =16
L H PNormal:2 4 0.9Fast: -30 1 .04Slow: 5 30 .06
OUT1
Observed: 5Predicted: Low = 5
High = 10
A
B
C
MIDLow = 3High = 6
Applying Failure Models
Normal: Delay: 2,4
Delayed: Delay 4,+inf
Accelerated: Delay -inf,2
Node17
Located On
Normal: Probability 90%
Parasite: Probability 9%
Other: Probability 1%
Component 1
Has models Has models
Modeling Underlying Resources• The model can be augmented with another level of detail
showing the dependence of computations on resources• Each resource has models of its state of compromise
– They can be abstract • node has cycle stealing, • network segment is being overloaded
• The modes of the resource models imply the modes of the computational models– E.g. if a computation resides on a node which is losing cycles, then
the computation model must be the retarded model.
Moving to a Bayesian Framework
• The model has levels of detail specifying computations, the underlying resources and the mapping of computations to resources
• Each resource has models of its state of compromise• The modes of the resource models are linked to the modes of the
computational models by conditional probabilities• The Model forms a Bayesian Network
Normal: Delay: 2,4
Delayed: Delay 4,+inf
Accelerated: Delay -inf,2
Node17
Located On
Normal: Probability 90%
Parasite: Probability 9%
Other: Probability 1%
Component 1
Has models Has models
Conditional probability = .2
Conditional probability = .4
Conditional probability = .3
Computational Models are Coupled through Resource Models
Delay:1,3
Node1 Node2
Precluded because physicality requires red green and yellow to all be delayed or all be accelerated
Blue delayedViolet delayedRed delayed, Yellow Negative TimeRed delayed, Green Negative TimeGreen delayed, Yellow Negative Time
Conflicts:
Diagnoses:
Component1
Delay:1,3
Component2
Component3
Component4
Component5
Delay:2,4
Delay:3,4
Delay:1,2
Delay:5,10Time:1,3
Time:3,7
Time:9,17
Time:5,9
Observed Time:27
Observed Time:6
Time:4,7
Time:4,5
Time:9,17Time:1,1
Time:3,5
Time:9,15
A
Host1
B
D
C
E
Host2 Host4Host3
N HNormal .6 .15Peak .1 .80Off Peak .3 .05
N HNormal .8 .3Slow .2 .7
Normal .9Hacked .1
Normal .85Hacked .15
Normal .8Hacked .2
Normal .7Hacked .3
N HNormal .50 .05Fast .25 .45Slow .25 .50
N HNormal .60 .05Slow .25 .45Slower .15 .50
N HNormal .50 .05Fast .25 .45Slow .25 .50
An Example System Description
Alarm
Earthquake
Burglar
Quake Burglar Alarm No AlarmT T .97 .03T F .65 .35F T .55 .45F F .03 .97
Bayesian Networks
• Bayesian Networks are a technique for representing complex problems involving evidential reasoning
• Reduces the need to state an exponential number of conditional probabilities
• Model involves nodes and links– Nodes represent statistical variables– Links represent conditional dependence between variables (I.e.
causation)– Links not present represent independence
• Bayesian Solvers compute joint probability of some nodes given the probability (or observation) of others.
System Description as a Bayesian Network• The Model can be viewed as a Two-Tiered Bayesian Network
– Resources with modes– Computations with modes– Conditional probabilities linking the modes
A
Host1
B
D
C
E
Host2 Host4Host3
N HNormal .6 .15Peak .1 .80Off Peak .3 .05
N HNormal .8 .3Slow .2 .7
Normal .9Hacked .1
Normal .85Hacked .15
Normal .8Hacked .2
Normal .7Hacked .3
N HNormal .50 .05Fast .25 .45Slow .25 .50
N HNormal .60 .05Slow .25 .45Slower .15 .50
N HNormal .50 .05Fast .25 .45Slow .25 .50
System Description as a MBT Model
• The Model can also be viewed as a MBT model with multiple models per device– Each model has behavioral description
• Except the models have conditional probabilities
A B
D
C
E
N HNormal .6 .15Peak .1 .80Off Peak .3 .05
N HNormal .8 .3Slow .2 .7
N HNormal .50 .05Fast .25 .45Slow .25 .50
N HNormal .60 .05Slow .25 .45Slower .15 .50
N HNormal .50 .05Fast .25 .45Slow .25 .50
Integrating MBT and Bayesian Reasoning• Start with each behavioral model in the “normal” state • Repeat: Check for Consistency of the current model• If inconsistent,
– Add a new node to the Bayesian network• This node represents the logical-and of the nodes in the conflict.• It’s truth-value is pinned at FALSE.
– Prune out all possible solutions which are a super-set of the conflict set. – Pick another set of models from the remaining solutions
• If consistent, Add to the set of possible diagnoses• Continue until all inconsistent sets of models are found• Solve the Bayesian network
Conflict:A = NORMALB = NORMALC = NORMAL
Discrepancy Observed Here
Least Likely Member of ConflictMost Likely Alternative is SLOW
A B
D
C
E
N HNormal .6 .15Peak .1 .80Off Peak .3 .05
N HNormal .8 .3Slow .2 .7
N HNormal .50 .05Fast .25 .45Slow .25 .50
N HNormal .60 .05Slow .25 .45Slower .15 .50
N HNormal .50 .05Fast .25 .45Slow .25 .50
Adding the Conflict to the Bayesian Network
A=N Br=N C=N T FT T T 1 0T T F 0 1T F T 0 1T F F 0 1F T T 0 1F T F 0 1F F T 0 1F F F 0 1
NoGood1Conflict:A = NORMALB = NORMALC = NORMAL
A
Host1
B
D
C
E
Host2 Host4Host3
N HNormal .6 .15Peak .1 .80Off Peak .3 .05
N HNormal .8 .3Slow .2 .7
Normal .9Hacked .1
Normal .85Hacked .15
Normal .8Hacked .2
Normal .7Hacked .3
N HNormal .50 .05Fast .25 .45Slow .25 .50
N HNormal .60 .05Slow .25 .45Slower .15 .50
N HNormal .50 .05Fast .25 .45Slow .25 .50
Truth Value =False Conditional Probability Table
Integrating MBT and Bayesian Reasoning (2)
• Repeat Finding all conflicts and adding them to the Bayesian Net.
• Solve the network again.– The posterior probabilities of the underlying resource models tell
you how likely each model is.– These probabilities should inform the trust-model and lead to
Updated Priors and guide resource selection.– The Posterior probabilities of the computational models tell you
how likely each model is. This should guide recovery.
• All remaining non-conflicting combination of models are possible diagnoses– Create a conjunction node for each possible diagnosis and add the new
node to the Bayesian Network (call this a diagnosis node)• Finding most likely diagnoses:
– Bias selection of next component model by current model probabilities
The Final Bayesian Network
NoGood1Conflict:A = NORMALB = NORMALC = NORMAL
A
Host1
B
D
C
E
Host2 Host4Host3
Off-Peak .028Peak .541Normal .432
Slow .738Normal .262
Slow .612Fast .065Normal .323
Slower .516Slow .339Normal .145
Slow .590Fast .000Normal .410
NoGood2 Conflict:A = NORMALB = NORMALC = SLOW
Hacked=.267Normal = .733
Hacked=.450Normal = .550
Hacked=.324Normal = .676
Hacked=.207Normal = .793
Value =False Value =False
Diagnosis-1
A = SLOWB = SLOWC = NORMALD = NORMALE = PEAK
Diagnosis-50
Final Model Probabilities
Hacked Hacked Hacked Normal NormalResource Posterior Prior Posterior PriorHost1 .324 .300 .676 .700Host2 .207 .200 .793 .800Host3 .450 .150 .550 .850Host4 .267 .100 .733 .900
Computation Mode ProbabilityA Off-Peak .028
Peak .541Normal .432
B Slow .738Normal .262
C Slower .516Slow .339Normal .145
D Slow .590Fast .000Normal .410
E Slow .612Fast .065Normal .323
Adding Attack Models
• An Attack Model specifies the set of attacks that are believed to be possible in the environment– Each resource has a set of vulnerabilities– Vulnerabilities enable attacks on that resource– We map attacks x resource-type to behavioral modes of the
resource – This is given as a set of conditional probabilities
• If this attack succeeded on a resource of this type then the likelihood that the resource is in mode-x is P
– This now forms a three tier Bayesian network
Host1 Buffer-Overflow
Has-vulerability
Overflow-AttackEnables
Unix-Family
Resource-type
CausesNormal
Slow
.5
.7
Three Tiered Model
Example Final Data
Effect of Attack Model
APriori No Attack
Buffer Overflow
Packet Flood
Both
Host1 .1 .291 .491 .668 .741
Host2 .15 .397 .543 .680 .770
Host3 .2 .206 .202 .574 .476
Host4 .3 .298 .296 .576 .480
Buffer Overflow
.4 .754 .567
Packet Flood
.5 .832 .693
Summary
• Diagnostic process goes from observations of computational behavior to underlying trust model assessments
• Three tiered model:– Vulnerabilities and Attacks– Compromised States of Resources– Non-Standard behavior of computation
• New synthesis of Bayesian and Model-Based reasoning
• Next Steps– Realistic ontology of attacks, compromise states, etc– Resource selection in light of diagnosis
• Challenges:– Realistic Attacks Models may swamp Bayesian net computation– How to handle unknown attacks
top related