Download - Fault Tolerance and Security
3-5th April 2005 Security and Protection of Information 2005
2
Outline
Introduction Background
Security Fault Tolerance
Major Contributions A Personal Perspective Future Challenges Conclusions
3-5th April 2005 Security and Protection of Information 2005
3
Introduction
Computer Security and Fault Tolerance share a subset of goals The ability to tolerate or mitigate failure in a
computer system The assumptions that underpin traditional
solutions make their merger non-trivial Security: Remove any replication and tighten
control Fault Tolerance: Replicate and compare results
3-5th April 2005 Security and Protection of Information 2005
4
Introduction – II
Recent cross-over research began with Reiter’s work on Rampart (mid 90s)
Spawned a new interest in the application of fault tolerant mechanisms in security: Tacoma: Provision of replication for mobile agents MAFTIA: A large-scale project to study
survivability in Internet applications We concentrate on two avenues of research:
Development of the fault model Progression of the replication mechanisms
3-5th April 2005 Security and Protection of Information 2005
5
Background – Security
Why the relatively late interaction? In our opinion, it has much to do with the
history of computer security: Trusted Computing Base Research was weighted towards confidentiality
and integrity – not availability Others had noted this gap in the computer
security literature [Needham,’94]
3-5th April 2005 Security and Protection of Information 2005
6
Background – Security – II
Very little in the open literature that dealt with Denial of Service (the absence of availability)
A notable exception [Gligor, ‘86]: An increase in Maximum Waiting Time (MWT) Legitimate and other forms of denial of service –
system returns before MWT Interesting exception [Turn and Habibi, ‘86]:
A security function is fault tolerant, if given the presence of a fault, the system’s security policy remains intact
3-5th April 2005 Security and Protection of Information 2005
7
Background – Fault Tolerance Fault Modelling:
Fault → Error → Failure Fault: Adjudged or hypothesized cause of error Error: The part of the system that may lead to
failure Failure: Service deviates from specification Four techniques within the dependability
paradigm: Fault prevention, fault tolerance, fault removal, fault
forecasting
3-5th April 2005 Security and Protection of Information 2005
8
Background – Fault Tolerance – II Replication Mechanisms:
Underlying group communication mechanisms Early work conducted at Cornell University:
Isis toolkit: CBCAST (Causal broadcast), ABCAST (Atomic broadcast)
Group Structures: State Machine Approach: Active replication, which
masks the failure of a proportion of the servers Primary Backup Approach: Passive replication, if the
primary fails, then a backup takes over
3-5th April 2005 Security and Protection of Information 2005
9
Major Contributions
Rampart Castro and Liskov Quorum Systems MAFTIA Tacoma Other Projects
3-5th April 2005 Security and Protection of Information 2005
10
Rampart
Group communication implemented by Reiter [Reiter, ’94 & ‘96]
First system to implement replicated service based on Byzantine agreement protocols
Main communication structure derived from the earlier work on Isis at Cornell
Extension over the Isis work through its ability to tolerate the malicious failure of a proportion of the servers within the group
3-5th April 2005 Security and Protection of Information 2005
11
Rampart – II
Choices over communication primitives within Rampart: State machine approach to replication Digital signatures to provide message
authentication in group communication primitive Lack of efficiency and scalability Although it has its drawbacks, it inspired the
majority of the remaining work The main research agenda as a result was
the search for more efficient protocols
3-5th April 2005 Security and Protection of Information 2005
12
Castro & Liskov
A new replication mechanism to overcome efficiency concerns [Castro & Liskov, ‘99]
Two main differences to Rampart: Primary backup model Pair-wise symmetric key Message Authentication
Codes A test implementation over NFS was only 3%
slower than Digital Unix NFS Efficiency gains are due to optimistic
protocols under normal operation
3-5th April 2005 Security and Protection of Information 2005
13
Quorum Systems
Data replication in a group of servers [Malkhi & Reiter, ‘97]
Move away from the state machine approach Increase scalability by removing the server-
to-server communication for a read operation However, their work does require server-to-
server communication for state update, and hence a write operation
3-5th April 2005 Security and Protection of Information 2005
14
MAFTIA
Malicious and Accidental Fault Tolerance for Internet Applications
Large EU funded project: 6 partners Expertise in fault tolerance, distributed computing,
cryptography, formal verification and intrusion detection
3 main areas of work: conceptual framework and architecture; mechanisms and protocols; formal verification and assessment
3-5th April 2005 Security and Protection of Information 2005
15
MAFTIA – Conceptual Model
Extension of the Fault → Error → Failure model Re-defining a Fault as an Intrusion:
Intrusion: A malicious, externally-induced fault resulting from an attack that has been successful in exploiting a vulnerability
Attack: A malicious interaction fault, through which an attacker aims to deliberately violate one or more security properties
Vulnerability: A fault created during development of the system, or during operation, that could be exploited to create an intrusion
3-5th April 2005 Security and Protection of Information 2005
16
MAFTIA – Conceptual Model – II In breaking down an Intrusion, they highlight
the possibility of targeting the removing or preventing of both Attacks and Vulnerabilities
Although MAFTIA’s main focus was Intrusion Tolerance, they classify a whole range of security mechanisms according to the fault prevention, tolerance, removal and forecasting paradigms mentioned earlier
3-5th April 2005 Security and Protection of Information 2005
17
MAFTIA – Hybrid Failure Model Composite fault model with a hybrid failure
assumption The presence and severity of vulnerabilities, attacks
and intrusions varies from component to component Assumptions present in their architectural design:
Built on top of trustworthy components: Java Card Trusted Timely Computing Base (TTCB) Trusted Middleware component
3-5th April 2005 Security and Protection of Information 2005
18
MAFTIA – Hybrid Failure Model – II The key element of the MAFTIA architecture
is the TTCB: Provision of time based services through the use
of a Control Channel Dedicated and heavily protected security kernel –
fail silent rather than arbitrary failure Implementation of a reliable broadcast
protocol that can tolerate up to f of f+2 failures [Correia et al., ‘02 ]
3-5th April 2005 Security and Protection of Information 2005
19
Tacoma
Tromso And COrnell Moving Agents project Provision of security and fault tolerance were two
key elements Resilience for the agent on a potentially malicious
host: Replicated agents, with voting mechanisms
Fault tolerance for mobile agents: Extension of the primary backup approach
“… preserving the necessary consistency between replicas can be done efficiently only within a local-area network”
3-5th April 2005 Security and Protection of Information 2005
20
Other Projects
COCA: Replication of a CA to provide availability Byzantine quorum systems Proactive recovery
OASIS (Organically Assured and Survivable Information Systems) Umbrella project which sponsors separate work
items in the field of resilient security
3-5th April 2005 Security and Protection of Information 2005
21
A Personal Perspective
Control of Execution: Adapting fault tolerant principles for a secure
environment can come down to a principle of control
In the Fault → Error → Failure model, breaking the chain requires retaining control
Whose security policy are we protecting? Proposed mechanisms for allowing a client to
share that control [Price, ‘99]
3-5th April 2005 Security and Protection of Information 2005
22
A Personal Perspective – II
Use of Other Mechanisms: Some of our previous work identified the
possibility of using timing checks [Price, ’01] Remove the attacker’s ability to delay or replay
messages with impunity Some variants of replay attacks rely on this
With hindsight, there is an interesting comparison with MAFTIA’s use of a Control Channel
3-5th April 2005 Security and Protection of Information 2005
23
Future Challenges
Relaxation of assumptions: Fully Byzantine failure models are difficult to protect against
– and hence solutions are inefficient Most of the work since Rampart have concentrated on
feasible means of relaxing these failure assumptions: can we do better?
Further use of hardware: MAFTIA’s use of trusted hardware allows for more efficient
protocols – can the principle be generalised? Mixed failure environments [Siu et al., ‘98] Trusted Computing Group
3-5th April 2005 Security and Protection of Information 2005
24
Future Challenges – II
Other dependability models: Fault tolerance is only part of a very mature dependability
literature Disjoint v Inclusive error recovery? MAFTIA defined a whole classification within their model
Security service classification: Quorum based systems use the parallelism of a read
operation to increase efficiency Can we class different services according to their
communication requirements?
3-5th April 2005 Security and Protection of Information 2005
25
Conclusions
Until 10 years ago, the work in this field was sparse and sporadic
Now there is a large body of work in this area Practical efficiency is still a key research topic Broaden our search for other applicable
mechanisms Availability and survivability on the Internet is
only going to become more important