dcs 6. basic distributed algorithms fundamentals wei yuan november,21,2013

SDP-MARCH-Talk

DCS 6. Basic Distributed Algorithms Fundamentals

Wei YuanNovember,21,2013

Outline

• Physical Clocks• Logical Clocks– Lamport’s Logical Clock– Vector Clock

• Global Snapshots

2

Physical Clocks

• Most computers today keep track of the passage of time with a battery-backed-up CMOS clock circuit, driven by a quartz oscillator. – battery backup to continue measuring time when power

is off

• Two registers with quartz: counter, holding register

• A Programmable Interval Timer, to generate an interrupt (clock tick) periodically

• The interrupt service procedure simply adds one to a counter in memory.

3

Problem

• Getting two systems to agree on time– Two clocks hardly ever agree– Quartz oscillators oscillate at slightly different

frequencies

• Clocks tick at different rates– Create ever-widening gap in perceived time– Clock Drift （时钟漂移）

• Difference between two clocks at one point in time– Clock Skew （时钟偏移）

4

Solution

• 国际原子时间（ international atomic time ， TAI ）• 统一协调时间（ Universal coordinated

time ， UTC ）• ……• 时间同步算法

5

Outline



6

Lamport’s Logical Clock

• A distributed system consists of a collection of distinct processes which are spatially separated, and which communicate with one another by exchanging messages. – A network of interconnected computers, the ARPA net– A single computer :the central control unit, the memory

units, and the input-output channels are separate processes

• Lamport L. Time, clocks, and the ordering of events in a distributed system[J]. Communications of the ACM, 1978, 21(7): 558-565.

7

Lamport’s happened before (→) relation

• Define the "happened before" relation without using physical clocks(partial ordering)

• Assumption– the system is composed of a collection of processes– Each process consists of a sequence of events– the execution of a subprogram on a computer– the execution of a single machine instruction

• We are assuming that the events of a process form a sequence, where a occurs before b in this sequence if a happens before b.

8

Lamport’s happened before () relation

(1)In the same process:if

(2) If is the sending of a message by one process and is the receipt of the same message by another process, then . (3) If and then.

• Two distinct events and are said to be concurrent if and .

• Assume that for any event . ( is an irreflexive partial ordering)

9

space-time diagram

• horizontal: space• vertical: time• dots: events• vertical lines:

process• wavy lines:

messages

10

• A clock is just a way of assigning a number to an event (abstract) – Clock for each process

• assign a number to any event in the process

– Clock for the entire system • = if is an event in process

• Clock Condition– For any events , : if then .– Cannot expect the converse condition to hold, since that

would imply that any two concurrent events must occur at the same time.(e.g., p2&p3 are both concurrent with q3)

11

• A process’ clock “ticks”– （ 1 ） means that there must be a tick line between any

two events on a process line– （ 2 ） means that every message line must cross a tick

line

12

Event counting example

13

Lamport’s logical timestamps

• Process ’s clock is represented by a register , so is the value contained by during the event .

• All processes use a local counter (logical clock) with initial value of zero

• Just before each event, the local counter is incremented by 1 and assigned to the event as its timestamp

• A send (message) event carries its timestamp • For a receive (message) event, the counter is

updated by max (receiver’s-local-counter, message-timestamp) + 1

14

Event counting example

Applying Lamport’s algorithm

15

Problem: Identical timestamps

• Concurrent events (e.g., b & g; i & k) may have the same timestamp … or not

• Total ordering: every event is assigned a unique timestamp (number), every such timestamp is unique.

16

Unique timestamps (total ordering)

We can force each timestamp to be unique• Define global logical timestamp

– represents local Lamport timestamp– represents process number (globally unique)

• e.g., (host address, process ID)

• Compare timestamps:– if and only if – or and

• Does not necessarily relate to actual event ordering

17

• Unique (totally ordered) timestamps

18

Problem: Detecting causal relations

• If – We cannot conclude .

•By looking at Lamport timestamps– We cannot conclude which events are causally related

•Solution: use a vector clock

19

Outline



20

Vector clocks

Rules:1. Vector initialized to 0 at each process 2. Process increments its element of the vector in local vector before timestamping event: 3. Message is sent from process with attached to it4. When receives message, compares vectors element by element and sets local vector to higher of two values • For example, received: [ 0, 5, 12, 1 ], have: [ 2, 8, 10, 1] new timestamp: [ 2, 8, 12, 1 ]

21

Comparing vector timestamps

• Define iff iff• For any two events e, e’

if then V(e) < V(e’)

… just like Lamport’s algorithm

if V(e) < V(e’) then

• Two events are concurrent if neither

V(e)V(e’) nor V(e’) V(e)

22

Vector timestamps

23

(0,0,0)

(0,0,0)

(0,0,0)

Vector timestamps

24

(1,0,0)

(0,0,0)

(0,0,0)

(0,0,0)

Vector timestamps

25

(0,0,0)

(0,0,0)

(0,0,0)

(1,0,0)

(2,0,0)

Vector timestamps

26

(1,0,0)

(2,0,0)

(0,0,0)

(0,0,0)

(0,0,0)

Vector timestamps

27

(1,0,0)

(2,0,0)

(0,0,0)

(0,0,0)

(0,0,0)

Vector timestamps

28

(1,0,0)

(2,0,0)

(0,0,0)

(0,0,0)

(0,0,0)

Vector timestamps

29

(1,0,0)

(2,0,0)

(0,0,0)

(0,0,0)

(0,0,0)

Vector timestamps

30

(1,0,0)

(2,0,0)

(0,0,0)

(0,0,0)

(0,0,0)

Two events are concurrent if neither V(e)≤V(e’) nor V(e’)≤ V(e)

Vector timestamps

31

(1,0,0)

(2,0,0)

(0,0,0)

(0,0,0)

(0,0,0)

Vector timestamps

32

(1,0,0)

(2,0,0)

(0,0,0)

(0,0,0)

(0,0,0) (2,1,0

)

Vector timestamps

33

(1,0,0)

(2,0,0)

(0,0,0)

(0,0,0)

(0,0,0) (2,2,0

)

Outline



34

“Distributed snapshots: determining global states of distributed systems”, K. Mani Chandy and Leslie Lamport, ACM TOCS 1985

35

Model of a Distributed System

• Finite set of processes as nodes.• Finite set of channels as edges.• Channels have infinite buffers, are error-free and FIFO.• The delay experienced by a message is arbitrary but finite.

36

p q

r

c1

c2

c3c4

A banking example to illustrate recording of consistent states

37

Global State of a Distributed System

Global State:• Union of the local states of the individual processes and the

state of the channels.• The state of a channel is determined by “Message in transit”

where the message is sent along the channel but not yet received.

• Initial global state for system:– each process is in initial state– the state of each channel is empty sequence

38

分布式系统的每个组件都有一个本地状态。进程状态：由本地存储器和活动历史描述。通道状态：由沿通道发送的消息减去沿通道接收消息的序列描述。

Global State Detection

• Many problems in distributed systems can be solved by detecting a global state of system.

• Stable property detection– A stable property which once becomes true, remains true

forever.– E.g. termination, deadlock, token loss etc.

• Checkpointing in distributed systems– E.g .debugging, failure recovering etc.

39

分布式系统中没有共享的存储器和全局时钟，本地时钟和本地存储器这样的分布式特性使得有效记录系统全局状态很困难。

检测如死锁和终止这样的稳态特性时，就需要检查系统全局状态。对于故障恢复，需要周期性地保存分布式系统的全局状态（称检查点），并通过把系统还原到最近保存的全局状态使恢复工作从进程故障点开始。

Distributed Computation• A distributed computation is the sequence of events.• There are three kind of events: local, send, receive.• An event is an atomic action that may change the state of

the process p and the state of at most one channel that is incident on p.

Definition of Event e• Event is a five-tuple e = <p, s, s', M, c>, where• p is the process in which the event occur,• s is the state of p immediately before the event,• s' is the state of p immediately after the event,• M is the message sent or received along the channel c.

40

Consistent Global State

• Consistency: every message that is recorded as received has also been recorded as sent.

• Consistent global states determined by a snapshots are the states that may have occurred during the computation.

41

同时满足以下两个条件：C1. 消息守恒。记录在进程 pi 的本地状态中发送的消息 mij

必须出现在通道 Cij 的状态中，或是出现在接收方进程 pj 的本地状态中。C2. 在得到的全局状态中，对于每一个结果，引起结果的原因也必须出现。

Chandy–Lamport Algorithm

• Each process in the system records its local state and the state of its incoming channels.

• Recorded states form a consistent global state.• Snapshot algorithm runs concurrently with the computation

but does not alter the underlying computation.• Snapshot algorithm uses marker as a recording signal.• Any process can initiate the snapshot by sending a marker

for all outgoing channels.• On receiving a marker a process records its own local state

and the states of all incoming channels.

42

Chandy–Lamport Algorithm contd.

Marker-Sending Rule for Process pi

(1) Process pi records its state.

(2) For each outgoing channel C on which a markerhas not been sent, pi sends a marker along C

before pi sends further messages along C.

43

Chandy–Lamport Algorithm contd.

Marker-Receiving Rule for Process pj

On receiving a marker along channel C:if pj has not recorded its state then

Record the state of C as the empty set Execute the “marker sending rule”else Record the state of C as the set of messages received along C after pj ’s state was recorded

and before pj received the marker along C

44

Thanks!Q&A

45

附• 集合上的关系称为偏序关系或偏序，当且仅当是自反的、反对称的和传

递的。• 偏序（ Partial Order ）设 A 是一个非空集， P 是 A 上的一个关系，若 P 满足下列条件：1. 对任意的 a∈A ，（ a,a ）∈ P;( 自反性）2. 若（ a,b ）∈ P ，且（ b,a ）∈ P ，则 a=b; （反对称性）3. 若（ a,b ）∈ P ，（ b,c ）∈ P ，则（ a,c ）∈ P; （传递性）则称 P 是 A 上的一个偏序关系。若 P 是 A 上的一个偏序关系，我们用 a≤b 来表示（ a,b ）∈ P 。

• 设如果对于每一个，或者有，或者有 , 则称小于等于为上的全序或线序。46

dcs 6. basic distributed algorithms fundamentals wei yuan november,21,2013

Documents

vector timestamps

ve ve slide

vector clocks

lamports logical timestamps

global state of system

time clock skew

distributed snapshots

global state detection