distributed systems principles and paradigms chapter 05 synchronization

Distributed Systems Principles and Paradigms

Chapter 05Synchronization

Communication & Synchronization

• Why do processes communicate in DS?– To exchange messages– To synchronize processes

• Why do processes synchronize in DS?– To coordinate access of shared resources– To order events

Time, Clocks and Clock Synchronization

• Time– Why is time important in DS?– E.g. UNIX make utility (see Fig. 5-1)

• Clocks (Timer)– Physical clocks– Logical clocks (introduced by Leslie Lamport)– Vector clocks (introduced by Collin Fidge)

• Clock Synchronization– How do we synchronize clocks with real-world time?– How do we synchronize clocks with each other?

05 – 1 Distributed Algorithms/5.1 Clock Synchronization

Physical Clocks (1/3)

Problem: Clock Skew – clocks gradually get out of synch and give different values

Solution: Universal Coordinated Time (UTC):• Formerly called GMT (Greenwich Mean Time)• Based on the number of transitions per second of the cesium 133

atom (very accurate).• At present, the real time is taken as the average of some 50

cesium-clocks around the world – International Atomic Time• Introduces a leap second from time to time to compensate that

days are getting longer.

UTC is broadcasted through short wave radio (with the accuracy of +/- 1 msec) and satellite (Geostationary Environment Operational Satellite, GEOS, with the accuracy of +/- 0.5 msec).

Question: Does this solve all our problems? Don’t we now have some global timing mechanism?



Problem: Suppose we have a distributed system with a UTC-receiver somewhere in it, we still have to distribute its time to each machine.

Basic principle:

• Every machine has a timer that generates an interrupt H (typically 60) times per second.

• There is a clock in machine p that ticks on each timer interrupt. Denote the value of that clock by Cp (t) , where t is UTC time.

• Ideally, we have that for each machine p, Cp (t) = t, or, in other words, dC/ dt = 1

• Theoretically, a timer with H=60 should generate 216,000 ticks per hour

• In practice, the relative error of modern timer chips is 10**-5 (or between 215,998 and 216,002 ticks per hour)



Goal: Never let two clocks in any system differ by more than time units => synchronize at least every 2seconds.


Where is the max. drift rate

Clock Synchronization Principles

• Principle I: Every machine asks a time server for the accurate time at least once every /2seconds (see Fig. 5-5).

But you need an accurate measure of round trip delay, including interrupt handling and processing incoming messages.

• Principle II: Let the time server scan all machines periodically, calculate an average, and inform each machine how it should adjust its time relative to its present time.

Ok, you’ll probably get every machine in sync. Note: you don’t even need to propagate UTC time (why not?)


Clock Synchronization Algorithms

• The Berkeley Algorithm

The time server polls periodically every machine for its time

The received times are averaged and each machine is notified of the amount of the time it should adjust

Centralized algorithm, See Figure 5-6

• Decentralized Algorithm

Every machine broadcasts its time periodically for fixed length resynchronization interval

Averages the values from all other machines (or averages without the highest and lowest values)

• Network Time Protocol (NTP)

the most popular one used by the machines on the Internet

uses an algorithm that is a combination of centralized/distributed05 – 6 Distributed Algorithms/5.2 Logical Clocks

Network Time Protocol (NTP)

• a protocol for synchronizing the clocks of computers over packet-switched, variable-latency data networks (i.e., Internet)

• NTP uses UDP port 123 as its transport layer. It is designed particularly to resist the effects of variable latency

• NTPv4 can usually maintain time to within 10 milliseconds (1/100 s) over the public Internet, and can achieve accuracies of 200 microseconds (1/5000 s) or better in local area networks under ideal conditions

• visit the following URL to understand NTP in more detail

http://en.wikipedia.org/wiki/Network_Time_Protocol

The Happened-Before Relationship

Problem: We first need to introduce a notion of ordering before we can order anything.

The happened-before relation on the set of events in a distributed system is the smallest relation satisfying:

• If a and b are two events in the same process, and a comes before b, then a b. (a happened before b)

• If a is the sending of a message, and b is the receipt of that message, then a b.

• If a b and b c, then a c. (transitive relation)

Note: if two events, x and y, happen in different processes that do not exchange messages, then they are said to be concurrent.

Note: this introduces a partial ordering of events in a system with concurrently operating processes.

05 – 6 Distributed Algorithms/5.2 Logical Clocks

Logical Clocks (1/2)

Problem: How do we maintain a global view on the system’s behavior that is consistent with the happened-before relation?

Solution: attach a timestamp C(e) to each event e, satisfying the following properties:

P1: If a and b are two events in the same process, and a b, then we demand that C (a) < C (b)

P2: If a corresponds to sending a message m, and b to the receipt of that message, then also C (a) < C (b)

Problem: How do we attach a timestamp to an event when there’s no global clock? maintain a consistent set of logical clocks, one per process.


Logical Clocks (2/2)

Each process Pi maintains a local counter Ci and adjusts this counter according to the following rules:

(1) For any two successive events that take place within Pi, Ci is incremented by 1.

(2) Each time a message m is sent by process Pi, the message receives a timestamp Tm = Ci.

(3) Whenever a message m is received by a process Pj, Pj adjusts its local counter Cj:

Property P1 is satisfied by (1); Property P2 by (2) and (3).

This is called the Lamport’s Algorithm


Logical Clocks – Example


Fig 5-7. (a) Three processes, each with its own clock. The clocks run at different rates. (b) Lamport’s algorithm corrects the clocks

Election Algorithms

Principle: Many distributed algorithms require that some process acts as a coordinator. The question is how to select this special process dynamically.

Note: In many systems the coordinator is chosen by hand (e.g., file servers, DNS servers). This leads to centralized solutions => single point of failure.

Question: If a coordinator is chosen dynamically, to what extent can we speak about a centralized or distributed solution?

Question: Is a fully distributed solution, i.e., one without a coordinator, always more robust than any centralized/coordinated solution?

05 – 18 Distributed Algorithms/5.4 Election Algorithms

Election by Bullying (1/2)

Principle: Each process has an associated priority (weight). The process with the highest priority should always be elected as the coordinator.

Issue: How do we find the heaviest process?

• Any process can just start an election by sending an election message to all other processes (assuming you don’t know the weights of the others).

• If a process Pheavy receives an election message from a lighter process Plight, it sends a take-over message to Plight. Plight is out of the race.

• If a process doesn’t get a take-over message back, it wins, and sends a victory message to all other processes.


Election by Bullying (2/2)

Question: We’re assuming something very important here – what?


Assumption: Each process knows the process number of other processes

Mutual ExclusionProblem: A number of processes in a distributed system want exclusive access to some resource.

Basic solutions:

• Via a centralized server.

• Completely distributed, with no topology imposed.

• Completely distributed, making use of a (logical) ring.

Centralized: Really simple:

05 – 22 Distributed Algorithms/5.5 Mutual Exclusion

Mutual Exclusion: Ricart & Agrawala

Principle: The same as Lamport except that acknowledgments aren’t sent. Instead, replies (i.e., grants) are sent only when:

• The receiving process has no interest in the shared resource; or

• The receiving process is waiting for the resource, but has lower priority (known through comparison of timestamps).

In all other cases, reply is deferred (see the algorithm on pg. 267)

05 – 23 Distributed Algorithms/5.5 Mutual Exclusion

The most straightforward way to achieve mutual. exclusion in a distributed system is to simulate how it is done in a one-processor system. One process is elected as the coordinator

READING:• Read Chapter 5

distributed systems principles and paradigms chapter 05 synchronization

Documents