orchestrator on ra : internals, benefits and considerations · orchestrator startup orchestrator...
TRANSCRIPT
![Page 1: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/1.jpg)
Orchestrator on Raft: internals, benefits and considerations
Shlomi Noach GitHub
PerconaLive 2018
![Page 2: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/2.jpg)
About me
@github/database-infrastructure
Author of orchestrator, gh-ost, freno, ccql and others.
Blog at http://openark.org
@ShlomiNoach
![Page 3: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/3.jpg)
Agenda
Raft overview
Why orchestrator/raft
orchestrator/raft implementation and nuances
HA, fencing
Service discovery
Considerations
![Page 4: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/4.jpg)
Raft
Consensus algorithm
Quorum based
In-order replication log
Delivery, lag
Snapshots! !
!!
!
![Page 5: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/5.jpg)
HashiCorp raft
golang raft implementation
Used by Consul
Recently hit 1.0.0
github.com/hashicorp/raft
![Page 6: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/6.jpg)
orchestrator
MySQL high availability solution and replication topology manager
Developed at GitHub
Apache 2 license
github.com/github/orchestrator
"
"
"
" ""
"
" ""
"
" ""
"
""
![Page 7: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/7.jpg)
Why orchestrator/raft
Remove MySQL backend dependency
DC fencing
And then good things happened that were not planned:
Better cross-DC deployments
DC-local KV control
Kubernetes friendly
![Page 8: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/8.jpg)
orchestrator/raft
n orchestrator nodes form a raft cluster
Each node has its own,dedicated backend database (MySQL or SQLite)
All nodes probe the topologies
All nodes run failure detection
Only the leader runs failure recoveries
"
"
"
" ""
"
" ""
"
" ""
"
""
![Page 9: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/9.jpg)
Implementation & deployment @ GitHub5 Nodes (2xDC1, 2xDC2, 1xDc3)
1 second raft polling interval
step-down
raft-yield
SQLite-backed log store
MySQL backend (SQLite backend use case in the works)
"
"
"
"
"
"
2xDC1
2xDC2
DC3
![Page 10: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/10.jpg)
A high availability scenario
o2 is leader of a 3-node orchestrator/raft setup
"
"
" ""
"" ""
"""
o1
o2
o3
![Page 11: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/11.jpg)
Injecting failure
master: killall -9 mysqld
o2 detects failure. About to recover, but…
"
"
" ""
"" ""
"""
o1
o2
o3
![Page 12: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/12.jpg)
Injecting 2nd failure
o2: DROP DATABASE orchestrator;
o2 freaks out. 5 seconds later it steps down
"
"
" ""
"" ""
"""
o1
o2
o3
![Page 13: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/13.jpg)
orchestrator recovery
o1 grabs leadership
"
"
" ""
"" ""
"""
o1
o2
o3
![Page 14: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/14.jpg)
MySQL recovery
o1 detected failure even before stepping up as leader.
o1, now leader, kicks recovery, fails over MySQL master
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 15: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/15.jpg)
orchestrator self health tests
Meanwhile, o2 panics and bails out.
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 16: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/16.jpg)
puppet
Some time later, puppet kicks orchestrator service back on o2.
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 17: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/17.jpg)
orchestrator startup
orchestrator service on o2 bootstraps, creates orchestrator schema and tables.
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 18: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/18.jpg)
Joining raft cluster
o2 recovers from raft snapshot, acquires raft log from an active node, rejoins the group
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 19: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/19.jpg)
Grabbing leadership
Some time later, o2 grabs leadership
"
"
" ""
"
"
"
"""
o1
o3
o2
![Page 20: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/20.jpg)
DC fencing
Assume this 3 DC setup
One orchestrator node in each DC
Master and a few replicas in DC2
What happens if DC2 gets network partitioned?
i.e. no network in or out DC2
"
"
" ""
"" ""
"""
DC1
DC2
DC3
![Page 21: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/21.jpg)
DC fencing
From the point of view of DC2 servers, and in particular in the point of view of DC2’s orchestrator node:
Master and replicas are fine.
DC1 and DC3 servers are all dead.
No need for fail over.
However, DC2’s orchestrator is not part of a quorum, hence not the leader. It doesn’t call the shots.
"
"
" ""
"" ""
"""
DC1
DC2
DC3
![Page 22: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/22.jpg)
DC fencing
In the eyes of either DC1’s or DC3’s orchestrator:
All DC2 servers, including the master, are dead.
There is need for failover.
DC1’s and DC3’s orchestrator nodes form a quorum. One of them will become the leader.
The leader will initiate failover.
"
"
" ""
"" ""
"""
DC1
DC2
DC3
![Page 23: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/23.jpg)
DC fencing
Depicted potential failover result. New master is from DC3.
"
"
"""
"
"
"
"
"""
DC1
DC2
DC3
![Page 24: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/24.jpg)
orchestrator/raft & consul
orchestrator is Consul-aware
Upon failover orchestrator updates Consul KV with identity of promoted master
Consul @ GitHub is DC-local, no replication between Consul setups
orchestrator nodes, update Consul locally on each DC
![Page 25: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/25.jpg)
Considerations, watch out for
Eventual consistency is not always your best friend
What happens if, upon replay of raft log, you hit two failovers for the same cluster?
NOW() and otherwise time-based assumptions
Reapplying snapshot/log upon startup
![Page 26: Orchestrator on Ra : internals, benefits and considerations · orchestrator startup orchestrator service on o2 bootstraps, ... Joining ra! cluster o2 recovers from raft snapshot,](https://reader035.vdocuments.site/reader035/viewer/2022063014/5fd1122a8c7e44655e50cc84/html5/thumbnails/26.jpg)
orchestrator/raft roadmap
Kubernetes
ClusterIP-based configuration in progress
Already container-friendly via auto-reprovisioning of nodes via Raft