Gossip Algorithmsand
Implementing a Cluster/Grid Information service
MsSys Course
Amar Lior and Barak Amnon
2
Agenda
• A short introduction to gossip algorithms
• Cluster/Grid Information services requirements– How good is old
information
• The distributed bulletin board model
• Implementation
3
A Problem
• In an n node system assume that every pair of nodes can communicate directly
• node i wishes to send a message (rumor, color) to all other nodes.
• Possible deterministic solutions–BROADCAST (only in a broadcast medium)
–Defining a static tree between the nodes and sending the message along the edges of this tree
4
A Gossip Style solution
• Starting with the round in which a rumor is generated
• each node that holds the rumor selects another node independently and uniformly at random
• send the rumor to this node
• The distribution of the rumor is terminated after some fixed number of O( ln n ) rounds
• At this point all players are informed with high probability
5
Uniform Gossip Example
1t
6
Uniform Gossip Example
t2
7
Uniform Gossip Example
t3
8
Uniform Gossip Example
t4
9
Uniform Gossip Example
t5
10
Gossip benefits
• Robustness to the presence of node failures–Messages will continue to propagate due to the
random selection of destination
– F nodes failure results in only O(F) uninformed players
• Simplicity–All nodes run the same algorithm
• Scalability– The number of massages each nodes send (and
possibly receive) each round is fixed
11
Gossip taxonomy
• Other names are– Epidemic algorithms (demers et al)– Randomized communication (Karp et al)
• Propagation can be done by– Push – sending the information from the node to the
selected node– Pull – the other way around– Push&Pull both ways
• We distinguish between 2 conceptual layers – A basic gossip algorithm
» by which nodes choose other nodes for communication– A gossip-based protocol
» Built on top of a gossip algorithm» Determine the content of the messages that are sent» The way received messages cause nodes to update their
internal state
12
Rumor speeding bounds
From a single node to all
• Time complexity:
• Message complexity (Karp el al) lower bound to the number of messages:
)(ln nO
)lnln( nn
13
Spatial Gossip (Kampe at al)
• New information is most interesting to nodes that are nearby
• Combines the benefits of– Uniform gossip
– Deterministic flooding
• The gossip algorithm chooses the nodes according to
• New information is spread to nodes at distance d with high probability,in :
)(log1 dO
Dxyx dcp )1(,
14
Aggregating values
• Gossip can also be used to aggregate a value over all nodes
• Average, maximum, minimum …
• In this case the question is how fast the local value in each node converge to the desired value
15
Cluster/Grid Information services
• Basic properties of Grid environment– Information sources are distributed – Individual sources are subject to failure– Total number of information providers is large–Both the types of information sources and the
ways it is used can be varied
• We cannot in general provide users with accurate information: any information delivered to a user is “old”–How useful is old information? (Mitzenmacher)–How to build an information service with
guaranteed age properties?
16
Distributed Bulletin board
• The system – Consists of ‘N’ nodes (or clusters)– Distributed– Nodes are subject to failure
• Each node maintains a data structure that holds an entry on selected (or all) nodes in the system
• We refer to this data structure as “The vector”• Each vector entry holds:
– state of the resources (static and dynamic) about the corresponding node
– age of the information (tune to the local clock)
• The vector is a distributed bulletin board that serves information requests locally
17
Algorithm 1- Information dissemination
• Each time unit– Update local information– Find all vector entries
which are up to age t– Choose a random node– Send the above entries to
that node
• Upon receiving a message– Compute the received
entries age– Update the entries which
the newly received information is fresher
A:1 B:12 C:2 D:4 E:11
A:1 C:2 D:4
A:4 B:12 C:2 D:4 E:11
B:1 C:3 E:3
18
Algorithm 1 : t=2
1t
19
Algorithm 1 : t=2
t2
20
Algorithm 1 : t=2
t3
21
Algorithm 1 : t=2
t4
22
Algorithm 1 : t=2
t5
23
Bounds and Approximations
• We want to know “how old” is the information in the vector
• First we find E(Xt) (for the asynchronous case)– The expected number of nodes that have information about
node i which is up to t time unit old
tn
tn
t
en
enXE
)1
1(
)1
1(
1
][
tt eXE ][ t
tXE 2][ Synchronous case
24
Bounds and Approximations
• An approximation for the expected age of the vector
)][
1(
1w
tv A
XE
n
n
nA
25
Real results
26
Approximating the age distribution
tkqn
tkXEAE
wAkk
k )1(
][][
• Ak is a random variable describing the number of nodes which are up to age k
27
Age distribution
28
Handling inactive nodes
• The presence of inactive nodes causes problems– Age quality of the
information deteriorate – Number of ARP
broadcasts increase linearly
• Using a fixed size window improves the age quality but the number of ARP broadcasts stay the same
29
Algorithm 2
• Algorithm 2 solves the above 2 issues
• Works basically the same as algorithm 1 with the following difference when sending a message– Calculate l the number of active nodes
(from the local vector)– Generate a random number between k=0…l – If K=0 send the window to all nodes– Else send the window only to the active nodes
• Using Algorithm 2 the maximal expected number of messages to inactive nodes ≤ 1– From all nodes at each round
30
Algorithm 2 – Age performance
31
Algorithm 2 – minimizing messages to inactive nodes
1t
32
Algorithm 2
t2
33
Algorithm 2
t3
34
Algorithm 2
t4
35
Supporting Urgent information
• In previous algorithm information is propagated from all nodes constantly
• In some cases we wish to send an important message urgently to all– such as the detection of a newly dead node– In this case the source node give the message high priority
2*log(n)• When a node assemble the window it is about to
send it takes the entries with the highest priority and only then the younger entries
• The priority of an entry is decremented every time unit
• The result is that urgent messages are disseminated in O(log(n)) steps
• And regular information is disseminated a bit slower
36
Information service clients
• MOSIX – load balancing
» Fresh information is used by the load balancing algorithm to consider migrating processes
– mmon, Mosix Monitoring tool» Presents the vector of a specific node» mmon –h xil-10
• MPICH– Improved assignment of processes to
nodes» No assignment to “dead” nodes» Assignment to the least loaded ones
• Nagios– Colleting information about clusters
over time (history)– Periodically retrieving a vector from a
machine and keeping it
• Decision algorithms in the cluster level– Leader election (queue fault
tolerance)– Node reservation
37
Conclusions
• Constructed a distributed bulletin board–Age properties are guaranteed
– The administrator can configure it to the desired properties
–No two nodes have the same view of the system
– Information requests are served locally
–Noise level (messages to inactive) is constant
–Urgent messages are propagated quickly
38
Future Work
• Investigating other gossip models–Push and Pull-Push
• Using only a partial view of the system