![Page 1: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/1.jpg)
ReviewSWE 622, Spring 2017
Distributed Software Engineering
![Page 2: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/2.jpg)
J. Bell GMU SWE 622 Spring 2017
HW5
2
0
10
20
30
40
50
60
70
80GMU SWE 622 HW 5 Scores (max possible = 80)
![Page 3: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/3.jpg)
J. Bell GMU SWE 622 Spring 2017
What do we want from Distributed Systems?
• Scalability • Performance • Latency • Availability • Fault Tolerance
3
“Distributed Systems for Fun and Profit”, Takada
![Page 4: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/4.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability• Performance • Latency • Availability • Fault Tolerance
“the ability of a system, network, or process, to handle a growing
amount of work in a capable manner or its ability to be enlarged to accommodate that growth.”
4
“Distributed Systems for Fun and Profit”, Takada
![Page 5: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/5.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability • Performance• Latency • Availability • Fault Tolerance
5
“is characterized by the amount of useful work accomplished by a
computer system compared to the time and resources used.”
![Page 6: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/6.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability • Performance • Latency• Availability • Fault Tolerance
6
“The state of being latent; delay, a period between the initiation of something and the it becoming
visible.”
![Page 7: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/7.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability • Performance • Latency • Availability• Fault Tolerance
7
“the proportion of time a system is in a functioning condition. If a user
cannot access the system, it is said to be unavailable.”
Availability = uptime / (uptime + downtime).
Availability % Downtime/year90% >1 month99% < 4 days
99.9% < 9 hours99.99% <1 hour99.999% 5 minutes
99.9999% 31 seconds
Often measured in “nines”
![Page 8: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/8.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributed Systems Goals• Scalability • Performance • Latency • Availability • Fault Tolerance
8
“ability of a system to behave in a well-defined manner once faults
occur”
What kind of faults?
Disks failPower supplies fail
Power goes out
Networking failsSecurity breached
Datacenter goes offline
![Page 9: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/9.jpg)
J. Bell GMU SWE 622 Spring 2017
More machines, more problems• Say there’s a 1% chance of having some hardware
failure occur to a machine (power supply burns out, hard disk crashes, etc)
• Now I have 10 machines • Probability(at least one fails) = 1 - Probability(no
machine fails) = 1-(1-.01)10 = 10% • 100 machines -> 63% • 200 machines -> 87% • So obviously just adding more machines doesn’t
solve fault tolerance
9
![Page 10: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/10.jpg)
J. Bell GMU SWE 622 Spring 2017
Designing and Building Distributed Systems
To help design our algorithms and systems, we tend to leverage abstractions and models to make assumptions
10
Stre
ngth
System model
Synchronous
Asynchronous
Failure Model
Crash-fail
Partitions
Byzantine
Consistency ModelEventual
Sequential
Generally: Stronger assumptions -> worse performance Weaker assumptions -> more complicated
![Page 11: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/11.jpg)
J. Bell GMU SWE 622 Spring 2017
Redis Replication
11
Master
Slave SlaveClient
Client
1: Write key “foo”= “bar”
2: Acknowledge write
3: Send update
foo=barfoo=bar
Read foo
foo=bar
Asynchronous updates, weak consistency model, highly available
![Page 12: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/12.jpg)
J. Bell GMU SWE 622 Spring 2017
CAP Theorem• Pick two of three:
• Consistency: All nodes see the same data at the same time (strong consistency)
• Availability: Individual node failures do not prevent survivors from continuing to operate
• Partition tolerance: The system continues to operate despite message loss (from network and/or node failure)
• You can not have all three, ever*• If you relax your consistency guarantee (from strong
to weak), then you can probably guarantee that
12
![Page 13: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/13.jpg)
J. Bell GMU SWE 622 Spring 2017
Choosing a consistency model
• Sequential consistency • All over - it’s the most intuitive
• Causal consistency • Increasingly useful
• Eventual consistency • Very popular in industry and academia • File synchronizers, Amazon’s Bayou and more
13
![Page 14: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/14.jpg)
J. Bell GMU SWE 622 Spring 2017
Transactions: Classic Example
boolean transferMoney(Person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance - amount; to.balance = to.balance + amount; return true; } return false; }
14
What can go wrong here?
![Page 15: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/15.jpg)
J. Bell GMU SWE 622 Spring 2017
Agreement
• In distributed systems, we have multiple nodes that need to all agree that some object has some state
• Examples: • Who owns a lock • Whether or not to commit a transaction • The value of a clock
15
![Page 16: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/16.jpg)
J. Bell GMU SWE 622 Spring 2017
Properties of Agreement
• Safety (correctness) • All nodes agree on the same value (which was
proposed by some node) • Liveness (fault tolerance, availability)
• If less than N nodes crash, the rest should still be OK
16
![Page 17: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/17.jpg)
J. Bell GMU SWE 622 Spring 2017
2PC Event Sequence
17
Coordinator
prepared
Participant
committed
done
Transaction state: Local state:prepared
uncertain
committed
Can you commit?
Yes
OK, commit
OK I committed
![Page 18: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/18.jpg)
J. Bell GMU SWE 622 Spring 2017
Timeouts in 2PC• Example:
• Coordinator times out waiting for Goliath National Bank’s response
• Bank times out waiting for coordinator’s outcome message
• Causes? • Network • Overloaded hosts • Both are very realistic…
18
![Page 19: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/19.jpg)
J. Bell GMU SWE 622 Spring 2017
Coordinator Timeouts• If coordinator times out waiting to hear from a bank
• Coordinator hasn’t sent any commit messages yet
• Can safely abort - send abort message • Preserves correctness, sacrifices performance
(maybe didn’t need to abort!) • If either bank decided to commit, it’s fine - they
will eventually abort
19
![Page 20: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/20.jpg)
J. Bell GMU SWE 622 Spring 2017
Handling Bank Timeouts• What if the bank doesn’t hear back from
coordinator? • If bank voted “no”, it’s OK to abort • If bank voted “yes”
• It can’t decide to abort (maybe both banks voted “yes” and coordinator heard this)
• It can’t decide to commit (maybe other bank voted yes)
• Does bank just wait for ever?20
![Page 21: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/21.jpg)
J. Bell GMU SWE 622 Spring 2017
Handling Bank Timeouts• Can resolve SOME timeout problems with
guaranteed correctness in event bank voted “yes” to commit
• Bank asks other bank for status (if it heard from coordinator)
• If other bank heard “commit” or “abort” then do that • If other bank didn’t hear
• but other voted “no”: both banks abort • but other voted “yes”: no decision possible!
21
![Page 22: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/22.jpg)
J. Bell GMU SWE 622 Spring 2017
2PC Timeouts
• We can solve a lot (but not all of the cases) by having the participants talk to each other
• But, if coordinator fails, there are cases where everyone stalls until it recovers
• Can the coordinator fail?… yes • Hence, 2PC does not guarantee liveness: a single
node failing can cause the entire set to fail
22
![Page 23: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/23.jpg)
J. Bell GMU SWE 622 Spring 2017
3 Phase Commit• Goal: Eliminate this specific failure from blocking
liveness
23
Coordinator
Participant A
Participant B
Participant C
Participant D
Voted yes
Voted yesXX Heard back “commit”
Voted yes
Voted yes
Did not hear result
Did not hear result
Did not hear result
![Page 24: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/24.jpg)
J. Bell GMU SWE 622 Spring 2017
3 Phase Commit• Goal: Avoid blocking on node failure • How?
• Think about how 2PC is better than 1PC • 1PC means you can never change your mind or have
a failure after committing • 2PC still means that you can’t have a failure after
committing (committing is irreversible) • 3PC idea:
• Split commit/abort into 2 sub-phases • 1: Tell everyone the outcome • 2: Agree on outcome
24
![Page 25: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/25.jpg)
J. Bell GMU SWE 622 Spring 2017
3PC Example
25
Coordinator Participants (A,B,C,D)
Soliciting votes
prepare
response
pre-commitCommit authorized (if all yes)
OKcommit
OK
Done
Status: Uncertain
Status: Prepared to commit
Status: Committed
![Page 26: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/26.jpg)
J. Bell GMU SWE 622 Spring 2017
3PC Crash Handling• Can B/C/D reach a safe decision…
• If any one of them has received preCommit? • YES! Assume A is dead. When A comes
back online, it will recover, and talk to B/C/D to catch up.
• Consider equivalent to in 2PC where B/C/D received the “commit” message and all voted yes
26
Participant B
Participant C
Participant D
CoordinatorXParticipant AX
![Page 27: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/27.jpg)
J. Bell GMU SWE 622 Spring 2017
3PC Crash Handling• Can B/C/D reach a safe decision…
• If NONE of them has received preCommit? • YES! It is safe to abort, because A can not
have committed (because it couldn’t commit until B/C/D receive and acknowledge the pre-commit)
• This is the big strength of the extra phase over 2PC
• Summary: Any node can crash at any time, and we can always safely abort or commit.
27
Participant B
Participant C
Participant D
CoordinatorXParticipant AX
![Page 28: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/28.jpg)
J. Bell GMU SWE 622 Spring 2017
3PC Timeout Handling
28
Coordinator Participants (A,B,C,D)
Soliciting votes
prepare
response
pre-commitCommit authorized (if all yes)
OKcommit
OK
Done
Status: Uncertain
Status: Prepared to commit
Status: Committed
Timeout causes abortTimeout causes abort
Timeout causes abort
Timeout causes commit
![Page 29: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/29.jpg)
J. Bell GMU SWE 622 Spring 2017
Does 3PC guarantee consensus?
• Reminder, that means: • Liveness (availability)
• Yes! Always terminates based on timeouts • Safety (correctness)
• Hmm…
29
![Page 30: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/30.jpg)
J. Bell GMU SWE 622 Spring 2017
Partitions
30
Participant B Participant C Participant D
Coordinator
Participant A
Soliciting VotesPrepared to commit
Uncertain Uncertain UncertainUncertain
Yes Yes Yes Yes
Network Partition!!!Prepared to commitPrepared to commitPrepared to commit
Timeout behavior: abortTimeout behavior: Commit!
Commit Authorized
Committed Aborted Aborted Aborted
![Page 31: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/31.jpg)
J. Bell GMU SWE 622 Spring 2017
FLP - Intuition• Why can’t we make a protocol for consensus/
agreement that can tolerate both partitions and node failures?
• To tolerate a partition, you need to assume that eventually the partition will heal, and the network will deliver the delayed packages
• But the messages might be delayed forever • Hence, your protocol would not come to a result,
until forever (it would not have the liveness property)
31
![Page 32: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/32.jpg)
J. Bell GMU SWE 622 Spring 2017
Partition Tolerance
• Key idea: if you always have an odd number of nodes…
• There will always be a minority partition and a majority partition
• Give up processing in the minority until partition heals and network resumes
• Majority can continue processing
32
![Page 33: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/33.jpg)
J. Bell GMU SWE 622 Spring 2017
Partition Tolerant Consensus Algorithms
• Decisions made by majority• Typically a fixed coordinator (leader) during a time period
(epoch) • How does the leader change?
• Assume it starts out as an arbitrary node • The leader sends a heartbeat • If you haven’t heard from the leader, then you challenge it by
advancing to the next epoch and try to elect a new one • If you don’t get a majority of votes, you don’t get to be leader • …hence no leader in a minority partition
33
![Page 34: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/34.jpg)
J. Bell GMU SWE 622 Spring 2017
ZooKeeper - Data Model• Provides a hierarchical namespace • Each node is called a znode • ZooKeeper provides an API to manipulate these
nodes
34
/
/app2/app1
/app1/z2/app1/z1
![Page 35: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/35.jpg)
J. Bell GMU SWE 622 Spring 2017
ZooKeeper - Guarantees
• Liveness guarantees: if a majority of ZooKeeper servers are active and communicating the service will be available
• Durability guarantees: if the ZooKeeper service responds successfully to a change request, that change persists across any number of failures as long as a quorum of servers is eventually able to recover
35
![Page 36: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/36.jpg)
J. Bell GMU SWE 622 Spring 2017
ZooKeeper - Applications
• Distributed locks • Group membership • Leader election • Shared counters
36
![Page 37: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/37.jpg)
J. Bell GMU SWE 622 Spring 2017
Failure Handling in ZK
• Just using ZooKeeper does not solve failures • Apps using ZooKeeper need to be aware of the
potential failures that can occur, and act appropriately
• ZK client will guarantee consistency if it is connected to the server cluster
37
![Page 38: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/38.jpg)
J. Bell GMU SWE 622 Spring 2017
Failure Handling in ZK
38
Client
ZK 1
ZK 2
ZK 3
2
1
3
Create event
ZK2 has network problem
Client reconnects to ZK3
Reissue create event to zk34
![Page 39: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/39.jpg)
J. Bell GMU SWE 622 Spring 2017
Failure Handling in ZK• If in the middle of an operation, client receives a
ConnectionLossException • Also, client receives a disconnected message • Clients can’t tell whether or not the operation was
completed though - perhaps it was completed before the failure
• Clients that are disconnected can not receive any notifications from ZK
39
![Page 40: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/40.jpg)
J. Bell GMU SWE 622 Spring 2017
Dangers of ignorance
40
Client 1
ZK
Client 2
Disconnected
Notification that /leader is dead
create /leader
Client 2 is leader
create /leaderReconnects, discovers
no longer leader
![Page 41: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/41.jpg)
J. Bell GMU SWE 622 Spring 2017
Dangers of ignorance
• Each client needs to be aware of whether or not its connected: when disconnected, can not assume that it is still included in any way in operations
• By default, ZK client will NOT close the client session because it's disconnected! • Assumption that eventually things will reconnect • Up to you to decide to close it or not
41
![Page 42: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/42.jpg)
J. Bell GMU SWE 622 Spring 2017
ZK: Handling Reconnection
• What should we do when we reconnect? • Re-issue outstanding requests?
• Can't assume that outstanding requests didn't succeed
• Example: create /leader (succeed but disconnect), re-issue create /leader and fail to create it because you already did it!
42
![Page 43: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/43.jpg)
J. Bell GMU SWE 622 Spring 2017
GFS• Hundreds of thousands of regular servers • Millions of regular disks • Failures are normal
• App bugs, OS bugs • Human Error • Disk failure, memory failure, network failure, etc
• Huge number of concurrent reads, writes
43
![Page 44: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/44.jpg)
J. Bell GMU SWE 622 Spring 2017
GFS Architecture
44
![Page 45: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/45.jpg)
J. Bell GMU SWE 622 Spring 2017
GFS Summary• Limitations:
• Master is a huge bottleneck • Recovery of master is slow
• Lots of success at Google • Performance isn't great for all apps • Consistency needs to be managed by apps • Replaced in 2010 by Google's Colossus system -
eliminates master
45
![Page 46: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/46.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributing Computation• Can't I just add 100 nodes and sort my file 100
times faster? • Not so easy:
• Sending data to/from nodes • Coordinating among nodes • Recovering when one node fails • Optimizing for locality • Debugging
46
![Page 47: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/47.jpg)
J. Bell GMU SWE 622 Spring 2017
Distributing Computation
• Lots of these challenges re-appear, regardless of our specific problem • How to split up the task • How to put the results back together • How to store the data
• Enter, MapReduce
47
![Page 48: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/48.jpg)
J. Bell GMU SWE 622 Spring 2017
MapReduce• A programming model for large-scale computations
• Takes large inputs, produces output • No side-effects or persistent state other than that input
and output • Runtime library
• Automatic parallelization • Load balancing • Locality optimization • Fault tolerance
48
![Page 49: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/49.jpg)
J. Bell GMU SWE 622 Spring 2017
MapReduce: Divide & Conquer
49
Combine
Result
r1 r2 r3 r4 r5
worker worker worker worker worker
w1 w2 w3 w4 w5
Partition
Big Data (lots of work)
![Page 50: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/50.jpg)
J. Bell GMU SWE 622 Spring 2017
MapReduce: Fault Tolerance• Ideally, fine granularity tasks (more tasks than
machines) • On worker-failure:
• Re-execute completed and in-progress map tasks • Re-executes in-progress reduce tasks • Commit completion to master
• On master-failure: • Recover state (master checkpoints in a primary-
backup mechanism)
50
![Page 51: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/51.jpg)
J. Bell GMU SWE 622 Spring 2017
Hadoop + ZooKeeper
51
DataNode DataNodeDataNode DataNode DataNodeDataNode
DataNode DataNodeDataNode DataNode DataNodeDataNode
DataNode DataNodeDataNode DataNode DataNodeDataNode
DataNode DataNodeDataNode DataNode DataNodeDataNode
NameNode
ZKClient
NameNode
ZKClient
Primary Secondary
ZK Server ZK ServerZK Server
Note - this is why ZK is helpful here: we can have the ZK servers partitioned *too* and still
tolerate it the same way
![Page 52: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/52.jpg)
J. Bell GMU SWE 622 Spring 2017
Where do we find data?• What's bad with the single master picture? • HDFS/GFS leverage the fact that there is relatively
little metadata, lots of data (e.g. few files, each file is large)
• What if there is really only metadata? • How can we build a system with high performance
without having this single server that knows where data is stored?
52
![Page 53: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/53.jpg)
J. Bell GMU SWE 622 Spring 2017
Hashing• The last one mapped every input to a different hash • Doesn't have to, could be collisions
53
Leo McGarryJosh Lyman
Sam Seaborn
Toby Ziegler
Inputs
Hash Function
01234567
Hash
![Page 54: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/54.jpg)
J. Bell GMU SWE 622 Spring 2017
Hashing for Partitioning
54
Input
Somebiglongpieceoftext
ordatabasekey
Hash Result
900405hash()= %20=
Server ID
5
![Page 55: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/55.jpg)
J. Bell GMU SWE 622 Spring 2017
Consistent HashingIt is relatively smooth: adding a new bucket doesn't change that much
55
0
4
8
12
Add new bucket: only changes location of keys
7,8,9,10
Delete bucket: only changes location of keys
1,2,3
![Page 56: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/56.jpg)
J. Bell GMU SWE 622 Spring 2017
Napster
56
Napster Master
Hi, I signed on, I have files f1, f2, f3 Who has f1?
client 1 client 2
client 1 doesCan I have f1?
Here is f1
Doesn’t everything just look like GFS, even things that predated it? :)
![Page 57: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/57.jpg)
J. Bell GMU SWE 622 Spring 2017
Gnutella 1.0
57
Can I have f1?where is f1?
where is f1?where is f1?
client 1
client 2 client 3
client 4
client 5
c2 has f1
![Page 58: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/58.jpg)
J. Bell GMU SWE 622 Spring 2017
BitTorrent
58
Tracker
Client
Client
Client
ClientClient
Client
![Page 59: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/59.jpg)
J. Bell GMU SWE 622 Spring 2017
Byzantine Generals Problem
59
Don’t attack!
Attack!
Attack!
![Page 60: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/60.jpg)
J. Bell GMU SWE 622 Spring 2017
Oral BFT Example (n=4, m=1)
60
Commander
Lieutenant 1 Lieutenant 2 Lieutenant 3
xy z
zx
![Page 61: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/61.jpg)
J. Bell GMU SWE 622 Spring 2017
Blockchains• Solution: make it hard for participants to take over the
network; provide rewards for participants so they will still participate
• Each participant stores the entire record of transactions as blocks
• Each block contains some number of transactions and the hash of the previous block
• All participants follow a set of rules to determine if a new block is valid
61
h0 h1 h2 h3 h4 h6 h7 h8 hn dn
![Page 62: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/62.jpg)
J. Bell GMU SWE 622 Spring 2017
Blockchain's view of consensus
62
h1 h2 h3 h4Miner 1:
Miner 2: h1 h2 h3 h4
h51
h52
Miner 3: h1 h2 h3 h4 h52 h62
“Longest chain rule” When is a block truly safe?
![Page 63: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/63.jpg)
J. Bell GMU SWE 622 Spring 2017
CFS• What is the system model?
• Synchronous replication (thanks to WAIT), but it has no rollback in event replication fails • We solve this by destroying the entire server if
the replication fails • Partition tolerant
63
![Page 64: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/64.jpg)
J. Bell GMU SWE 622 Spring 2017
CFS Fault Tolerance
64
Master
Slave SlaveSlave Slave
Client Client
Master
Goal 1: Promote new master when master crashes
![Page 65: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/65.jpg)
J. Bell GMU SWE 622 Spring 2017
CFS Fault Tolerance
65
Master
Slave SlaveSlave Slave
Client Client
disconnected
OK
Master
Disconnected
Disconnected
![Page 66: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/66.jpg)
J. Bell GMU SWE 622 Spring 2017
CFS Fault Tolerance
66
Slave SlaveSlave Slave
Client Client
disconnected
OK
MasterDisconnected
SlaveDisconnected
Goal 2: Promote new master when partitioned + have quorum, cease
functioning in minority
![Page 67: Lecture 12 - Review · J. Bell GMU SWE 622 Spring 2017 CAP Theorem • Pick two of three: • Consistency: All nodes see the same data at the same time (strong consistency) • Availability:](https://reader033.vdocuments.site/reader033/viewer/2022060215/5f05e45b7e708231d4153d51/html5/thumbnails/67.jpg)
J. Bell GMU SWE 622 Spring 2017
CFS + Failures + Inconsistency
67
Master SlaveSlave Slave
Would a partition like this cause inconsistency? No, if we WAIT for all replicas; Yes if we only wait for a
quorum!