stanford university april 2015 - raft · 2020-04-29 · stanford university april 2015...

16
THE RAFT CONSENSUS ALGORITHM DIEGO ONGARO AND JOHN OUSTERHOUT STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io

Upload: others

Post on 30-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

THE RAFT CONSENSUS ALGORITHMDIEGO ONGARO AND JOHN OUSTERHOUT

STANFORD UNIVERSITY

APRIL 2015

raftconsensus.github.io

Page 2: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

MOTIVATIONGoal: shared hash table (state machine)Put it on a single machine attached to network

Pros: easy, consistentCons: prone to failure

With Raft, keep consistency yet deal with failures

Page 3: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

WHAT IS CONSENSUSAgreement on shared state (single system image)Recovers from server failures autonomously

Minority of servers fail: no problemMajority fail: lose availability, retain consistency

Servers

Key to building consistent storage systems

Page 4: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

REPLICATED STATE MACHINES

x←3 y←2 x←1 z←6Log

ConsensusModule

StateMachine

Log

ConsensusModule

StateMachine

Log

ConsensusModule

StateMachine

Servers

Clients

x 1

y 2

z 6

x←3 y←2 x←1 z←6

x 1

y 2

z 6

x←3 y←2 x←1 z←6

x 1

y 2

z 6

z←6

Replicated log ⇒ replicated state machineAll servers execute same commands in same order

Consensus module ensures proper log replicationSystem makes progress as long as any majority of servers upFailure model: fail-stop (not Byzantine), delayed/lost msgs

Page 5: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

HOW IS CONSENSUS USED?Top-level system configuration

repl. state machine

S1 S2

S3

N N N N...

repl. state machine

leader standby standby

S1 S2

S3

N N N N...

Replicate entire database state

Page 6: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

PAXOS PROTOCOLLeslie Lamport, 1989Nearly synonymous with consensus

“The dirty little secret of the NSDI community isthat at most five people really, truly understand

every part of Paxos ;-).” —NSDI reviewer

“There are significant gaps between thedescription of the Paxos algorithm and the

needs of a real-world system...the final systemwill be based on an unproven protocol.”

—Chubby authors

Page 7: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

RAFT'S DESIGN FOR UNDERSTANDABILITYWe wanted an algorithm optimized for building real systems

Must be correct, complete, and perform wellMust also be understandable

“What would be easier to understand or explain?”

Fundamentally different decomposition than PaxosLess complexity in state spaceLess mechanism

Page 8: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

RAFT USER STUDY

0

10

20

30

40

50

60

0 10 20 30 40 50 60

Raft g

rade

Paxos grade

Raft then PaxosPaxos then Raft

0

5

10

15

20

implement explain

number o

f participants

Paxos much easierPaxos somewhat easierRoughly equalRaft somewhat easierRaft much easier

Page 9: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

RAFT OVERVIEW1. Leader election

Select one of the servers to act as cluster leaderDetect crashes, choose new leader

2. Log replication (normal operation)Leader takes commands from clients, appends to its logLeader replicates its log to other servers (overwritinginconsistencies)

3. SafetyOnly a server with an up-to-date log can become leader

Page 10: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

RAFTSCOPE VISUALIZATION

Page 11: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

CORE RAFT REVIEW1. Leader election

Heartbeats and timeouts to detect crashesRandomized timeouts to avoid split votesMajority voting to guarantee at most one leader per term

2. Log replication (normal operation)Leader takes commands from clients, appends to its logLeader replicates its log to other servers (overwritinginconsistencies)Built-in consistency check simplifies how logs may differ

3. SafetyOnly elect leaders with all committed entries in their logsNew leader defers committing entries from prior terms

Page 12: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

RANDOMIZED TIMEOUTSHow much randomization is needed to avoid split votes?

0%

20%

40%

60%

80%

100%

100 1000 10000 100000

cumulative percent

time without leader (ms)

150­150ms150­151ms150­155ms150­175ms150­200ms150­300ms

Conservatively, use random range ~10x network latency

Page 13: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

RAFT IMPLEMENTATIONSName Primary Authors Language License

Blake Mizerany, Xiang Li and Yicheng Qin (CoreOS) Go Apache2.0

(Sky) and (CMU, CoreOS) Go MIT

(hashicorp) Go MPL-2.0

Java Apache2

(Stanford, Scale Computing) C++ ISC

Scala Apache2

Javascript MPL-2.0

(Basho) Erlang Apache2

Moiz Raja, Kamal Rameshan, Robert Varga (Cisco), Tom Pantelis(Brocade)

Java Eclipse

Javascript MIT

Javascript ISC

Scala Apache2

C BSD

Copied from Raft website, probably stale.

etcd/raft

go-raft Ben Johnson Xiang Li

hashicorp/raft Armon Dadgar

copycat Jordan Halterman

LogCabin Diego Ongaro

akka-raft Konrad Malawski

kanaka/raft.js Joel Martin

rafter Andrew Stone

OpenDaylight

liferaft Arnout Kazemier

skiff Pedro Teixeira

ckite Pablo Medina

willemt/raft Willem-Hendrik Thiart

Page 14: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

LOGCABINC++, permissive licenseStarted as research platform for Raft at StanfordWorking with Scale Computing to make production-ready1.0 release later this month

Rolling upgrades for 1.0Looking for more users!

CCabinLLog

github.com/logcabin

Page 15: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

CONCLUSIONConsensus widely regarded as difficultRaft designed for understandability

Easier to teach in classroomsBetter foundation for building practical systems

Pieces needed for a practical system:Cluster membership changes (simpler in dissertation)

Log compaction (expanded tech report/dissertation)

Client interaction (expanded tech report/dissertation)

Evaluation (dissertation: understandability, correctness, leader election & replication performance)

Page 16: STANFORD UNIVERSITY APRIL 2015 - Raft · 2020-04-29 · STANFORD UNIVERSITY APRIL 2015 raftconsensus.github.io. MOTIVATION Goal: shared hash table ... 100% 100 1000 10000 100000 c

QUESTIONSraftconsensus.github.io

mailing listraft-dev