autonomic distributed systems

21
Autonomic distributed systems

Upload: carlo

Post on 06-Jan-2016

29 views

Category:

Documents


1 download

DESCRIPTION

Autonomic distributed systems. Think about this. computer population. x10 9. Human population. 7. 6. 5. 4. 1980. 1990. 2000. 2010. 2. Think about this. Machines will fail from time to time, regardless of how carefully they are designed. But who will manage these systems? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Autonomic distributed systems

Autonomic distributed systems

Page 2: Autonomic distributed systems

2

Think about this

Human population

1980 1990 2000 2010

5

4

6

7

x109 computer population

Page 3: Autonomic distributed systems

3

Think about this

Machines will fail from time to time, regardless of how carefully

they are designed. But who will manage these systems? Even if everyone joins IT, it is not enough! Isn’t this a crisis?

Systems have to take care of themselves.

Self-help is the best help.

Page 4: Autonomic distributed systems

4

What does it mean?

These are many such desirable self-- properties that be added to theWish list. These properties collectively called self-* properties characterize an Autonomic System.

Self-help

Self-healing

Self-organizing

Self-optimizing

Self-protecting

Self-managing

Self-stabilizing

Page 5: Autonomic distributed systems

5

Self-healing

The Spirit Mars rover has a

radiation-hardened R6000 CPU from

Lockheed-Martin Federal Systems.

One day, while performing a crucial

task, Spirit Mars Rover fell silent,

alone on the emptiness of Mars.

What next?

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Courtesy: Jet Propulsion Lab

Page 6: Autonomic distributed systems

6

Self-healing

The problem was eventually remotely detected by ground control.

The operating system tried to allocate more files than the RAM-based directory structure could accommodate. It caused an exception that suspended the task that attempted the allocation. NASA ground control deleted some files, and reformatted the entire flash memory system. On February 6, 2004 the rover was restored to its original working condition, and science activities resumed.

It would have been nice if the detection and repair could be done by the rover itself …

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Courtesy: Jet Propulsion Lab

Page 7: Autonomic distributed systems

Self-stabilization

• Technique for spontaneous restoration of a system predicate.

• Forward error recovery (memoryless) -- does not bother about

the impact of the failure as long as the recovery is

guaranteed.

• Guarantees eventual safety following failures.

Feasibility demonstrated by Dijkstra (CACM 1974)

Page 8: Autonomic distributed systems

Self-stabilizing systems

Starting from any initial configuration, the system is guaranteed

to recover to a legitimate configuration (L is true) in a bounded

number of steps, as long as the codes are not corrupted.

Page 9: Autonomic distributed systems

Self-stabilizing systems

Transient failures perturb the global state. The ability to spontaneously recover from any initial state implies that no initialization is ever required.

State space

legal

Page 10: Autonomic distributed systems

Self-stabilizing systems

Self-stabilizing systems exhibits non-masking

fault-tolerance. It satisfies the following two

criteria

fault

1. Convergence

2. Closure

Not L Lconvergence

closure

Page 11: Autonomic distributed systems

Adaptive Distributed Systems

System behavior spontaneously changes when the environment changes

A traffic control system

AM / PM

AM L AM holdsPM L PM holds

L = (AM L AM ) (PM L PM )

defines the system invariant

Page 12: Autonomic distributed systems

Example 1: Stabilizing mutual exclusion

01 62 4 753

N-1

Consider a unidirectional ring of processes. In the legal configuration, exactly one tokenwill circulate in the network

Page 13: Autonomic distributed systems

A solution

1 4320

{Process 0} repeat x[0] = x[N-1] x[0] := x[0] N 1 forever

{Process j > 0} repeat x[j] ≠ x[j -1] x[j] := x[j-1] forever

The state of process j is x[j] {0, 1, 2, K-1}, and N > K

TOKEN = ENABLED GUARD

Guard or condition

action

0n

Page 14: Autonomic distributed systems

Does it work?

First, be convinced that it works.

Then think about why it will work.

Page 15: Autonomic distributed systems

Example 2: Stabilizing spanning tree

• Given a connected graph G = (V,E) and a root r,

design an algorithm for maintaining a spanning

tree in presence of transient failures that may

corrupt the local states of processes.

• Let n = |V|

Page 16: Autonomic distributed systems

A solution

Each process i has two variables L(i) and P(i):L(i) = Distance from the root via tree edgesP(i) = parent of process i

By definition L(r) = 0, and P(r) is undefined. In a legal state

i V | i ≠ r : L(i) ≠ n L(i) = L(P(i)) +1.

Page 17: Autonomic distributed systems

Sample case

0

1

2

5

4

3

0

1

2

5

4

3

1

2

3 4

5

P(2) is corrupted

Page 18: Autonomic distributed systems

The algorithm

(R0) (L(i) ≠ n) (L(i) ≠ L(P(i)) +1) (L(P) ≠ n) L(i) :=L(P(i)) +1

(R1) (L(i) n) (L(P(i)) =n) L(i):=n

(R2) (L(i) =n) (k Neighbors(i):L(k) < n-1) L(i) :=L(k)+1; P(i):=k

The algorithm has three rules R0, R1, R2:

Page 19: Autonomic distributed systems

Proof of stabilization

Define an edge from i to P(i) to be well-formed,

when L(i) ≠ n, L(P(i) ≠ n and L(i) = L(P(i)) +1.

In any configuration, the well-formed edges form

a spanning forest. Delete all edges that are not

well-formed. Designate each tree T(k) in the

forest by the lowest value of L in it.

Page 20: Autonomic distributed systems

Example

In the sample graph shown earlier.T(0) = {0, 1, T(2) = {2, 3, 4, 5}

Let F(k) denote the number of T(k)’s in the forest.

Define a tuple F= (F(0), F(1), F(2) …, F(n)).

For the sample graph, F = (1, 0, 1, 0, 0, 0) after node 2

had the transient failure that changed P(2) from 2 to 4.

Page 21: Autonomic distributed systems

Skeleton of the proof

Minimum F = (1,0,0,0,0,0) {legal configuration}

Maximum F = (1, n-1, 0, 0, 0, 0).

With each action, F decreases lexicographically.

Verify the claim!

This proves that eventually F becomes (1,0,0,0,0,0) and

the spanning tree stabilizes.