ziv dayan200388130 tom afek kafka200637247 instructor ittay eyal

DISTRIBUTED FAILURE DETECTOR Ziv Dayan 200388130 Tom Afek Kafka 200637247 Instructor Ittay Eyal

Post on 19-Dec-2015




0 download


Page 1: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal


Ziv Dayan200388130

Tom Afek Kafka200637247

InstructorIttay Eyal

Page 2: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal


What is a failure detector? Our failure detector

Software Implementation Gossip style Independent local unit

Page 3: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal


Page 4: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Implementation Communication – by messages Each message contains a list of heartbeats Each heartbeat contains

IP of creator Time since creation

Each node contains its own Local Node: Local NodeLocal Node

Net MembersNet MembersNodeNode NodeNode NodeNode NodeNode NodeNode NodeNode



NeighborNeighbor NeighborNeighbor NeighborNeighbor NeighborNeighbor

VersionVersion VersionVersion VersionVersion VersionVersion VersionVersion

Page 5: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Network Construction

Page 6: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Failure Detection Method

Repeat periodically: Choose the node whose threshold is

closest to expiration Wait until the threshold has expired Check the local time of creation of the

last heartbeat received by the suspected node: If changed – the node is OK Else – the suspected node had crashed

Page 7: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Thread Diagram

Computer ListenerComputer Listener


Message HandlerMessage Handler

Message SenderMessage Sender



Page 8: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Version Handling A new abstract class is added –

NetMessage Method 1: Handle() – decodes the received

message using the proper version and returns Message

Method 2: toString() – used for serializationNetMessage

SHA1Message NormalMessage


Page 9: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Version Agreement Protocol

initiator responder

,i iaddr V

,i rv

,i rv

NetMsg msg

,i rvNetMsg msg

Page 10: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Readers Writers Problem

Page 11: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Heartbeat Rate

H = f(P, n, threshold) Assumptions required

Simplicity Vs Efficiency Full topology Spread time << threshold

Page 12: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Heartbeat Rate – Take I

Assumption – Local Information Strong Assumption

Reliability x – number of messages - Probability for false detection We want

Result :


1 1

xn PLR

Pn n



1 1

xn PLR

n n


1 1

log 1thresh


n n



Page 13: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Take I Performance

Linear Performance The bigger is P the bigger is the slope

Page 14: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Heartbeat Rate – Take II

Assumptions Synchrony Consistency

Calculation for average case

Page 15: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Take II Performance

High Performance

Page 16: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Which Method Is better?

Comparison Categories Efficiency Scalability Dynamism Reliability

Page 17: Ziv Dayan200388130 Tom Afek Kafka200637247 Instructor Ittay Eyal

Thank you for listening