enabling high availability and disaster recovery in couchbase server
TRANSCRIPT
![Page 1: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/1.jpg)
High Availability / Disaster Recover
Mel Boulos Solutions Engineer
Couchbase
![Page 2: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/2.jpg)
©2015 Couchbase Inc. 3
Next 40 minutes …
Part I - High Availability – Single node architecture– Local data redundancy– Rebalance and failover– Node recovery
Part II - Disaster Recovery– Business continuity for “mission-critical” applications – Geo redundancy – Backup-Restore for worst case scenario
![Page 3: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/3.jpg)
Part I - High Availability
![Page 4: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/4.jpg)
©2015 Couchbase Inc. 5
Couchbase Server – Single Node Architecture
Single node type is the foundation for high availability architecture
No Single Point of Failure (SPOF)
Easy scalability
STORAGE
Couchbase Server 1
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 2
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 3
SHARD7
SHARD9
SHARD5
SHARDSHARDSHARD
Managed Cache
Cluster ManagerCluster Manager
Managed Cache
Storage
Data Service
Index Service
Query Service
![Page 5: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/5.jpg)
©2015 Couchbase Inc. 6
Intra-Cluster Replication – Data Redundancy
RAM to RAM replication
Max of 4 copies of data in a Cluster
Bandwidth optimized through de-duplicate, or ‘de-dup’ the item
Intra-cluster replication is the process of replicating data on multiple servers within a cluster in order to provide data redundancy.
![Page 6: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/6.jpg)
©2015 Couchbase Inc. 7
Write Operation – Data RedundancyAPPLICATION SERVER
MANAGED CACHE
DISK
DISK
DOC 1
DOC 1DOC 1
Caching based on Memcached: App gets an ACK when write is successfully in RAM Or RAM+Replicated Or RAM+Persisted Or
RAM+Replicated+Persisted
DCP based Replication: writes queued to other nodes
Couchstore based Storage: writes queued for storage
DCP
INDEXER
![Page 7: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/7.jpg)
©2015 Couchbase Inc. 8
Database Change Protocol – Data Redundancy
DCP is new streaming replication protocol in Couchbase Server 3.0 High-Performance, Stream-
based Protocol
Better Resume-ability after blips and failures
Ordering
Consistent
Intra-Cluster Replication
Cross Datacenter Replication
Incremental Rebalance
Incremental Backup & RestoreExternal
streams for Change Data Capture (CDC) in future
Incremental Map/Reduce Views
Global Secondary Indexes
Connectors (Kafka, Scoop, Spark)
![Page 8: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/8.jpg)
©2015 Couchbase Inc. 9
Auto Tuning Shared Thread Pool - Durability
Efficient Auto-Tuning Engine Detect and allocate threads
based on HW resources
Pool threads for best resource utilization
Improved latency across the board
Faster Rebalance
Faster Node Reactivation
Faster Durability with Writes & PersistTo
![Page 9: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/9.jpg)
©2015 Couchbase Inc. 10
Rebalance Operation – Data Availability Rebalance redistributes data-partitions (data) around
cluster– When adding nodes– When removing nodes– When nodes have failed over
Aim is to bring cluster back to optimal health Data-partitions are moved between nodes automatically Rebalance happens on an active cluster
– Allows you to expand/shrink without pausing your application– Client libraries automatically handle the rebalance and
redistribute their requests accordingly
![Page 10: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/10.jpg)
©2015 Couchbase Inc. 11
Failover Operation - Fault-tolerance Failover automatically switches-over to the
replicas for a given database– Gracefully under node maintenance– Immediately under auto-failover– Can be triggered manually through the
Admin-UI/REST/CLI
Automatic failover in case of unplanned outages – system failures– Can be configured through Admin-UI/REST/CLI– Constraints in place to avoid “split-brain” and false
positives– 30 second delay, multiple heartbeat “pings”– Clusters >=3 nodes– Only one node down at a time
![Page 11: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/11.jpg)
©2015 Couchbase Inc. 12
Automatic Failover – “In action”
SERVER 4 SERVER 5
Replica
Active
Replica
ActiveActive
SERVER 1
Shard 5
Shard 2
Shard 9Shard
Shard
Shard
Replica
Shard 4
Shard 1
Shard 8Shard
Shard
Shard
Active
SERVER 2
Shard 4
Shard 7 Shard 8
Shard
Shard Shard
Replica
Shard 6
Shard 3 Shard 2
Shard
Shard Shard
Active
SERVER 3
Shard 1
Shard 3
Shard 6Shard
Shard
Shard
Replica
Shard 7
Shard 9
Shard 5Shard
Shard
Shard
App servers accessing Shards
Requests to Server 3 fail
Cluster detects server failed Promotes replicas
of Shards to active
Updates cluster map
Requests for docs now go to appropriate server
Typically rebalance would follow
Shard 1 Shard 3
Shard
COUCHBASE Client Library
CLUSTER MAP
COUCHBASE Client Library
CLUSTER MAP
![Page 12: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/12.jpg)
©2015 Couchbase Inc. 13
Node Recovery – Bring Cluster back to Capacity
Failed-Over node can re-added back to cluster – Full recovery – Add back as a fresh node– Delta Node recovery – Add back failed node incrementally
into the cluster without having to rebuild the full node.
![Page 13: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/13.jpg)
©2015 Couchbase Inc. 14
Rack-Zone Awareness – Rack-Zone Availability
Grouping of servers into server groups so that each group is on a physically separate rack
Ensures that replica data partitions are not on the same rack as the primary partitions
Rack 1
1
2
3
Rack 2
4
5
6
Rack 3
7
8
9
Servers 1, 2, 3 on Rack 1 Servers 4, 5, 6 on Rack 2 Servers 7, 8, 9 on Rack 3 Cluster has 2 replicas (3 copies
of data) This is a balanced configuration
![Page 14: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/14.jpg)
©2015 Couchbase Inc. 15
Couchbase Server - MDS Architecture (NEW in 4.0)What is Multi-Dimensional Scalability?
MDS is the architecture that enables independent scaling of data, query and indexing workloads. That also provides isolation of services for minimized interference.
Independent “zones” for Query, Index and Data Services
Index Service
Couchbase Cluster
Query Service Data Service
node1 node8
![Page 15: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/15.jpg)
©2015 Couchbase Inc. 16
Couchbase Server - MDS Architecture (NEW in 4.0)
![Page 16: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/16.jpg)
Part I I – Disaster Recovery
![Page 17: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/17.jpg)
©2015 Couchbase Inc. 18
Cross Datacenter Replication (XDCR) Unidirectional Replication
Hot spare / Disaster Recovery
Development/Testing copies
Bidirectional Replication
Datacenter Locality
Multiple Active Masters
![Page 18: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/18.jpg)
©2015 Couchbase Inc. 19
Cross Datacenter Replication (XDCR) using DCP
Replicates continuously data FROM source cluster to remote clusters may be spread across geo’s
Supports unidirectional and bidirectional operation Application can read and write from both clusters (active –
active replication) Automatically handles node addition and removal Simplified Administration via Admin UI, REST, and CLI Pause and resume XDCR replication (NEW in 4.0) Filtering of data on replication stream
![Page 19: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/19.jpg)
©2015 Couchbase Inc. 20
XDCR – Memory based using DCP
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
DOC 1
DOC 1
Intra-Cluster Replication
INDEXER
Cross Datacenter Replication
DOC 1DOC 1
![Page 20: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/20.jpg)
©2015 Couchbase Inc. 21
Backup & Restore - Oops cbbackup tools provides backup for a running cluster
– Entire Cluster – across all bucket – Single Node – across all buckets– Single Node – single bucket– Supports remote or local access
![Page 21: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/21.jpg)
©2015 Couchbase Inc. 22
Minimize time and resources during backups
Efficient Recovery with Incremental Backup & Restore
• Back up only the data updated since the last backup
• Differential Backups• Cumulative Backups
![Page 22: Enabling High Availability and Disaster Recovery in Couchbase Server](https://reader038.vdocuments.site/reader038/viewer/2022102917/587756c11a28ab84388b7763/html5/thumbnails/22.jpg)
Thank you.
Questions?