red hat storage server replication past, present, & future
DESCRIPTION
"In this session, we’ll detail Red Hat Storage Server data replication strategies for both near replication (LAN) and far replication (over WAN), and explain how replication has evolved over the last few years. You’ll learn about: Past mechanisms. Near replication (client-side replication). Far replication using timestamps (xtime). Present mechanisms. Near replication (server side) built using quorum and journaling. Faster far replication using journaling. Unified replication. Replication using snapshots. Stripe replication using erasure coding."TRANSCRIPT
![Page 1: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/1.jpg)
RED HAT STORAGE SERVERREPLICATION: PAST AND PRESENTJeff Darcy, Venky Shankar, Raghavan PichaiGlusterFS/RHS Developers @ Red Hat
![Page 2: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/2.jpg)
Talk Outline
Background Local replication Remote replication Next steps Questions
![Page 3: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/3.jpg)
BackgroundTypes of replication, goals, and challenges
![Page 4: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/4.jpg)
Synchronous Replication
S
S
Y
Y
N
N
C
C
+ high consistency - network sensitive
![Page 5: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/5.jpg)
Quorum Enforcement
Replica #1 Replica #2 Replica #3
Majority can write Minority can’t
There can only be one majority => no split brain
![Page 6: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/6.jpg)
Synchronous Replication Data Flows
X
X
X
Y
Y
Y
Chain Fan Out
Client
Server
Server
Client
Server
Server
![Page 7: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/7.jpg)
Fan Out Replication
Y
Y
Y Client
Server
Server
SplitBandwidth
Wait forSlowest
![Page 8: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/8.jpg)
Chain Replication
X
X
X
Client
Server
Server
FullBandwidth
Two Hops
![Page 9: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/9.jpg)
Asynchronous Replication
A
A
S
S
C
C
Y
Y
N
N
+ low consistency - network insensitive
![Page 10: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/10.jpg)
Effect of Network Partitions
A
A
S
S
MY
Y N
What’s the correct value?
![Page 11: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/11.jpg)
Tradeoff Space
Network Sensitive Network Insensitive
HighConsistency
LowConsistency
S
A
![Page 12: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/12.jpg)
Red Hat StorageSynchronous Near-ReplicationRaghavan PDeveloper, Red Hat
![Page 13: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/13.jpg)
Traditional replication using AFR
“Automatic file replication” Client based replication Entry, meta data and data based replication. Automated Self healing in case bricks recover after failure.
![Page 14: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/14.jpg)
AFR Sequence Diagram
Client 1
Client 2
Server A
Server B
LockPre Op
OpPost Op
Unlock
Lock (blocked) Pre Op
![Page 15: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/15.jpg)
AFR improvements
In 3.4 release Eager locking Piggybacking Server quorum In 3.5 release Granular self heal
In 3.6 release Rewrite of the code Pending counters Self healing in the context of self heal daemon
![Page 16: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/16.jpg)
NSR – new style (aka server side) replication Replication to the back end (brick processes) Controlled by a designated “leader” also known as sweeper. AdvantagesBandwidth usage of client network optimized for direct (fuse) mountsAvoidance of split brain Sweeper elected using majority principle. Per term Changelog on the sweeper preseves the ordering of operations.Variable consistency models for trading consistency with performance.
![Page 17: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/17.jpg)
NSR high level blocks
NSR client side translator
Sends IO to sweeper
Sweeper (leader)
Forwards IO to peers
Commits after all peer completion
Non sweeper (follower)
Accepts IO only from sweeper or reconciliation
Rejects IO from client (client retry)
Change log
Reconciliation
Makes use of membership to figure out terms missing.
Makes use of change logs for syncing the corresponding terms.
![Page 18: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/18.jpg)
NSR Sequence Diagram
Client 1
Client 2
Sweeper
Follower
Client 1 Request
Client 2 Request
![Page 19: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/19.jpg)
Red Hat Storage ServerGeo-ReplicationVenky ShankarDeveloper, Red Hat
![Page 20: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/20.jpg)
Geo-Replication Asynchronous data replication Continuous, Incremental
Across geographies One site (master) to another (slave) Multi-slave Cascading Fan-out
Disaster Recovery
![Page 21: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/21.jpg)
Remote Replication: Past
![Page 22: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/22.jpg)
Single node Change detection Crawling (xtime based crawl)
Data synchronization Rsync
Suboptimal processing rename, deletes, hardlink
Overview
![Page 23: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/23.jpg)
Crawling and xtime
Xtime Inode changed time Marked up to root (marker xlator)
Crawling/Scanning Directory crawl and file synchronization
xtime(master) > xtime(slave)
Slave xtime maintained by master
![Page 24: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/24.jpg)
![Page 25: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/25.jpg)
Remote Replication: Present
![Page 26: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/26.jpg)
Overview Multi node Distributed (parallel) synchronization Replica failover
Change detection Consumable journals
Data synchronization (configurable) Rsync, tar+ssh (large number of small files)
Efficient processing rename, delete, hardlink
![Page 27: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/27.jpg)
Journaling
Journaling Translator (changelog) Records FOP (efficiently) local to a brick Data, Entry, Metadata
Change detection : O(1) relative to number of changes
Consumer library (libgfchangelog) Per brick Publish/Subscribe mechanism Journals periodically published
![Page 28: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/28.jpg)
Remote Replication: Future
![Page 29: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/29.jpg)
Replicating Snapshots Multi Master Vector clocks Conflict detection & resolution
Libgfapi integration Geo-replication to Swift target
Features
![Page 30: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/30.jpg)
Red Hat Storage ServerReplication-related FeaturesJeff DarcyDeveloper, Red Hat
![Page 31: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/31.jpg)
Unified Replication
Leader
ChangeLog
LocalReplica
ChangeLog
RemoteReplica
ChangeLogSync Async
![Page 32: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/32.jpg)
Erasure Coding (a.k.a. “disperse”)
D1 D2 D3 D4 P1 P2 P3
D1 D2 D3 D4 P1 P2 P3
D2
![Page 33: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/33.jpg)
Also…
VolumeSnapshot
FileSnapshot
Deduplication+
CompressionChecksums
OK
OK
![Page 34: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/34.jpg)
Tiering (a.k.a. data classification)
Tier 0
Tier 1
Tier 2
SSD, no replication
Normal disk, sync replication
SMR disk, erasure codingcompression + checksumsasync replication
![Page 35: Red Hat Storage Server Replication Past, Present, & Future](https://reader033.vdocuments.site/reader033/viewer/2022052523/5562ed30d8b42ad26c8b51e4/html5/thumbnails/35.jpg)
Questions?