intrusion tolerant distributed systems – algorithms and architectures angelo corsaro & venkita...
TRANSCRIPT
Intrusion Tolerant Distributed Systems – Algorithms and
Architectures
Angelo Corsaro & Venkita Subramonian
DOC Group, Washington University
Software Systems Research Seminar
March 21, 2003
Security: State of the Art
Most of secure systems are nowadays built by trying to prevent attacks
Several techniques and tools have been developed to make more secure systems, detect system weakness, and protect systems New Programming Languages Software Tools like code analyzer, system profiler, etc. New Hardware/Software components etc. etc.
Yet, systems’ security keeps being compromised!!!
Nowadays pervasive interconnectivity introduces more challenges for security
The lesson learned in securing systems is that this brute force approach does not work.
Experience has led to the key observation that it isn’t practical/feasible to build 100% secure systems
Classical Secure Distributed Systems
Classical Secure distributed systems are based on the assumption that there exist part of the system which is trusted
The basic and recurrent idea is that of connecting distributed components together so as to form a global secure infrastructure
This approach requires large trusted parts on all computers on the network
Kerberos
One of the most used and deployed distributed security systems is Kerberos
It was designed and implemented at the MIT as part of the Athena project
The core assumption at the base of Kerberos’s design are the following: Client workstations are totally under control of the user, i.e., can’t
be trusted Remote services can be accessed only via an authentication
service Servers are trusted, and are physically protected The servers are under the complete control and responsibility of
the administrator
The master server is replicated on passive slaves, which can replace the server when it fails
Kerberos
1. Request for a TGS ticket
2. Ticket for TGS
Kerberos
Client
Ticket Granting Service
Server
1
2
Kerberos
1. Request for a TGS ticket
2. Ticket for TGS
3. Request for Server Ticket
Kerberos
Client
Ticket Granting Service
Server
1
2 3
Kerberos
1. Request for a TGS ticket
2. Ticket for TGS
3. Request for Server Ticket
4. Server Ticket
Kerberos
Client
Ticket Granting Service
Server
1
2 3
4
Kerberos
1. Request for a TGS ticket
2. Ticket for TGS
3. Request for Server Ticket
4. Server Ticket
5. Request for Service
Kerberos
Client
Ticket Granting Service
Server
1
2 3
4
5
Kerberos’s Security Problems
The security administrator can misuse his privileges to performs unauthorized actions
Replicas (Kerberos uses passive replication) can also provide information to intruders if not well protected
If Kerberos server fails, the last DB changes are lost
Nothing is done to prevent “covert” channels
There is a single point of failure!!!
Security: New Trends
Eliminating flaws that make systems un-secure is not feasible (especially for legacy systems)
Currently adopted solutions for distributed systems’ security have quite a few problems
How about building systems that can continue critical operations in face of attacks?
Can we build systems that instead of trying to prevent attacks can instead tolerate them?
Intrusion Tolerance: The Idea
Intrusion Tolerant Systems are designed in such a way that they can tolerate a bounded number of misuses
If one or more intruders by-pass the protection mechanism and if the number of misuses they do is less than a given threshold, the security properties of the system: Confidentiality Integrity Availability
Are always ensured!!!
The key observation at the basis of Intrusion Tolerant systems is that an intrusion can be though as a Byzantine Fault
Types of Intrusion Tolerance
Confidentiality: Read access to a subset of confidential data gives no information about the data
Integrity: The change of a subset of data does not change the data perceived by legitimate users
Availability: The change or deletion of a subset of data or of a server does not produce a denial of service to legitimate users
For each property P is defined a threshold Tp
The reading, modifying or destroying a part X of the data or server D such that |X| < T
|X|< T Intrusion
Data Intrusion Tolerance
Data intrusion-tolerance techniques have existed for a long time
Confidentiality can be ensured by cryptographic tools like the threshold scheme
The data is shared in shadows, each shadow being stored on one security site
To build the data it is sufficient a number of shadows called the threshold
This scheme ensures availability and integrity
To prevent denial of service the server are replicated
Different sites cannot take decision independently, they must agree by communicating data and local decisions
This last point requires replication and agreement
File BFile C
File A
Site X
Site Y
Site Z
Intrusion Tolerant Security Server
The goal of an Intrusion Tolerant Distributed Security server is that of providing a trusted service out of a set of potentially untrusted computers
This way, the intrusion of one of some of the computers won’t compromise the security of the global system
All the sites that are part of the security service, called security sites, have to provide a series of services: Registration Authentication Sensitive Data Management Audit and Recovery Service
Registration Service
The registration permits a user to be registered by the system for future access to secured services
This operation must be carried out independently on each security site to prevent a single site from using information to impersonate the user
The operation is done under control of the security administrator of each site
Authentication Service
The role of this service is to verify the claimed identity of a subject
In a distributed system with several authentication servers, each server must independently authenticate the subject
Notice that the security sites are untrusted and one site could fake the authentication information
An agreement protocol is used to make sure that the user is authenticated if a majority of server succeeded
Upon authentication the server sends the user some session information, such as session id, key etc.
Authorization Service The role of the authorization service is that of checking that the
access to a secured service by a subject is authorized according to its access-rights
Access rights could be implements in a UNIX-like manner
The authorization service is made intrusion tolerant by implementing it on security servers Authorization phases are: The client asks the security server
for permission to access a secured service
The access rights stored on the security sites allow to determine if the client has the proper rights
The security sites vote to decide if the access is authorized
If the sites agree to permit access they send a ticket to the client, and another to the server
Using the ticket the client can now open a session with the server
Sensitive Data Management Service
The role of this service is to store, manage and retrieve the sensitive information on the security servers
The data management service must enforce the three main security properties Confidentiality Integrity Availability
Integrity property is provided by a modification detection mechanism based such as cryptographic signatures
Replication can be used to ensure availability, while threshold techniques could be used for confidentiality and availability
Sensitive Data Management Service
If data is replicated on N sites, then With respect to availability, up
to N-1 replicas can be lost With respect to confidentiality,
one replica is sufficient to observe the data
If one data item is shared on N security sites using a threshold of T, then With respect to availability, N-
T shadows can be lost With respect to confidentiality,
T shadows are necessary and sufficient to observe the data
The Audit and Recovery Service
The role of this service is to audit the security information sent by the services
There exists two kind of information Authorized operations Attempted or successful intrusion or
misuse
Notice that it is not a role of the service that of determine what constitutes an intrusion or a misuse
Analysis of the audit is done offline by security administrators
The recovery service acts as an error recovery mechanism to correct certain modified data
FT Node architecture
P1 P2 P3 P1 P2 P3 P1 P2 P3
Bus Controller Bus Controller Bus Controller
Local broadcast medium
…
Cluster1
Cluster2
Cluster3
Distributed Voting
Two phases Local Computation
Compute results locally and broadcast results Majority reconciliation
Determine if majority exists Initiate fault diagnostics if necessary
Distributed algorithm for both phases
Coordinator commits the majority vote
Phase2(1/2)
Distributed algorithm that runs on every voter
Receive result from all voters
If my result same as all other results
we have a unanimous vote
commit vote
Else if we have more than 50% of the results the same
we have a majority
if I am the coordinator and my result NOT same as majority result
select a new coordinator from among the “majority processors”
commit vote
if I am the coordinator
initiate fault recovery in minority nodes
(continued…)
Phase2(2/2)
Else
we do not have a majority
start local diagnostics
if my status = “okay”
select new coordinator from among “okay” processors
repeat voting process
Choosing a new coordinator
New coordinator chosen from a processor set
Candidate processor set could be all processors, when there is no majority or set of processors belonging to the majority
Check local node status
If status = “okay”
broadcast status to other processors
wait until broadcast from other processors arrive
if my node has the largest node id among “okay” processors
I declare myself new coordinator
Committing a Vote
Coordinator responsible for committing majority vote
If I am the coordinator
broadcast result to majority
wait for ack from all processors in majority
Else
wait for result from coordinator
send ack to coordinator
Problems with 2 Phase protocol
What if coordinator fails right before committing majority vote? User (client) will receive bad result
Probability very less Within acceptable risk parameters
But transient faults could have adverse effect on security
An attacker could control what result a user sees Majority does not matter any more
Security and transient faults
Transient faults could hamper security Illuminating a single transistor in an IC using a laser
Serious threat to Smartcard technology Attack invented and perfected by Sergei Skorobogatov,
Cambridge University
“Sergei's work will trigger a generation change in smartcard technology. The immediate effect of his work is that many attacks on computer systems that were developed as theoretical possibilities by the research communities in the 1990s have suddenly become practical”
– EE Times, May 2002
A Solution
voter voter voter
Client
1
2
2
33
Algorithm by Castro and Liskov
Pros Commit done by all voters as opposed to just one coordinator, hence
more secure than the 2-Phase algorithm
Cons Does not scale well, since client has to wait for f+1 replies
Other algorithms
More algorithms in literature
Reiter, M., “The Rampart Toolkit for Building High-Integrity Services,” Theory and Practice in Distributed Systems,Lecture Notes in Computer Science 938, pp. 99-110.
Malkhi, D., Reiter, M., “Byzantine Quorum Systems,”Proceedings of the 29th ACM Symposium on Theory of Computing, May 1997.
Kihlstrom, K., et al., “The SecureRing Protocols for Securing Group Communication,” Proceedings of the 31st Hawaii International Conference on System Sciences, Vol. 3, pp. 317-326, Jan 1998.
Deswarte, Y., et al. “Intrusion Tolerance in Distributed Computing Systems,” Proceedings of the 1991 IEEE Symposium on Research in Security and Privacy, pp. 110-121, May 1991.
Inexact voting
Drawbacks to the previous algorithms Assumes state machine replication in all voters Two different non-faulty voters will produce the same result
Some use-cases where this assumption does not hold E.g., sensor values
Inexact voting Values that fall within a range of tolerance are considered equal Equivalence classes
Algorithms can be modified to handle inexact voting
BUT, performance overhead large for multiple inexact comparisons to determine majority
Proposed Algorithm Assumptions
Network with Atomic broadcast capability Bounded message delay Fair-sharing of broadcast medium
No voter will commit answer until all voters ready Enforced using application dependent thresholds Any commits before this threshold are considered invalid
Majority of voters are fault-free for reliable working of the system
Each voter can vote only once Enforced by the User Interface module
Proposed Algorithm (1/2)
voter
Interface Module Client
voter voter
1
1. Commit, if not committed already2. Compare with committed result
2 2
3. Timer expires, send result to client
3
Proposed Algorithm (2/2)
voter
Interface Module Client
voter voter
1
2 2
1. Commit, if not committed already2. Compare with committed result3. Dissent, if no match
33
4. Commit new vote
4
5. Reset timer expiry
5
Uniqueness of this algorithm
Security increased No specific coordinator node – hence reduced vulnerability Even if the first commit to User Interface module is
compromised, it gets invalidated by dissenting voters “Denial of Service” (vote-rigging) eliminated since a vote
from an already committed voter is ignored
Fault-tolerance properties maintained as before Result still based on majority
Concerns about the User-Interface module Single point of failure BUT, this module is very simple with very little computation User-Interface module can be isolated from the voter complex
Less intensive computation on the client Does not have to reconcile all results from voters
Authentication
Voters must be authenticated by User Interface module before accepting commits
This should not increase the complexity of the module
Strong authentication with minimal interaction between voters and the interface module preferred
Example mechanism Use SKEY authentication
SKEY authentication scheme
VoterInterface Module
vote f n(R)
vote f n-1(R)
vote f(R)
…
R
R’
f is a one-way function
Distributed voting in WAN
Centralized voting not appropriate in a WAN setting
Multiple hops for vote to reach from voter to coordinator
Link failures could partition the network
Network congestion in the vicinity of the coordinator
Inexact voting could be computationally very intensive Sensor data from a vast coverage area
Single coordinator target for malicious attack
Assumptions
Reliable transport
Messages are digitally signed and subject to verification before delivery to upper layer
Unverifiable messages are discarded
Presence of Public-Key infrastructure
Every voter knows the public key of every other voter
Secure voting
voter voter voter
1. Send signed vote to other voters, hash the result and save it
2. Verify sign and compare with own result
3. Hash sender’s result, sign it and send endorsement back
11
3
3
2 24
4. Verify the endorsement and compare it with saved value in step 1
Performance
Time complexity Each voter signs its result and broadcasts it - O(1) Each voter waits to receive one signed vote from every other
voter – O(n) Each voter does vote comparison – O(1) Each voter receives an endorsement from every other voter –
O(n) Complexity is O(n)
Number of messages Voter sends vote to every other voter – n(n-1) Voter sends endorsement to every other voter – n(n-1) O(n2)
Concluding Remarks
The Intrusion Tolerance mechanism described provide a much robust way of enforcing security that traditional techniques
The intrusion tolerance mechanism based on fragmentation-scattering ensures confidentiality and integrity of data and availability of services
Efficient and secure voting algorithms are an essential part of intrusion tolerant systems
More research needed to make intrusion tolerance a “real” technology
Scope for further research overlapping security and fault-tolerance
Fault tolerance vs Security
Fault-tolerant Design Secure Design
Guard against faulty system components or random faults
Guard against malicious outside attacks
Optimistic Pessimistic
Probabilistic phenomena Directed Intelligent attack
Redundancy as a solution Redundancy as an adversity
Redundancy – a boon or a bane?
Degree of redundancy
Eff
ect
of
red
un
dan
cy Fault toleranceSecurity
Desired security behavior