mongodb: advance concepts - replication and sharding
TRANSCRIPT
MongoDB- Advance ConceptsReplication AndSharding
Piyush Rana Software Consultant Knoldus Software LLP
Agenda
1) What is Replication ?
2) How Replication is handled ?
3) Replica Sets Or Master-Slave Replication.
4) What and Why Sharding ?
5) Implementation of Sharding .
Replication
> Replication is the process of synchronizing data across multiple servers. > Replication provides redundancy and increases data availability with multiple copies of data on different database servers, replication protects a database from the loss of a single server. >Disaster Recovery> No downtime for maintenance (like backups, index rebuilds, compaction)> Read scaling (extra copies to read from)
How Replication Works
> MongoDB achieves replication by the use of replica set. A replica set is a group of mongod instances that host the same data set.
- Replica set is a group of two or more nodes (generally minimum 3 nodes are required). - In a replica set one node is primary node and remaining nodes are secondary. - All data replicates from primary to secondary node. - At the time of automatic failover or maintenance, election establishes for primary and a new primary node is elected. After the recovery of failed node, it again join the replica set and works as a secondary node.
Replica Set Members
1) Primary2) Secondaries2.1) Priority 0 Replica Set Members2.2) Hidden Replica Set Members.2.3) Delayed Replica Set Members
3) Arbiter
Primary Replica Set Member
The primary is the only member in the replica set that receives write operations. MongoDB applies write operations on the primary and then records the operations on the primarys oplog.
Secondary members replicate this log and apply the operations to their data sets.
Priority 0 Replica Set Members
A secondary maintains a copy of the primarys data set. A priority 0 member is a secondary that cannot become primary. Priority 0 members cannot trigger elections. Otherwise these members function as normal secondaries.
A priority 0 member maintains a copy of the data set, accepts read operations, and votes in elections.
Configure a priority 0 member to prevent secondaries from becoming primary, which is particularly useful in multi-data center deployments.
Hidden Replica Set Members
A hidden member maintains a copy of the primarys data set but is invisible to client applications.Hidden members must always be priority 0 members and so cannot become primary.
The db.isMaster() method does not display hidden members. Hidden members, however, may vote in elections.
Delayed Replica Set Members
Delayed members contain copies of a replica sets data set. However, a delayed members data set reflects an earlier, or delayed, state of the set.
Must be priority 0 members. Set the priority to 0 to prevent a delayed member from becoming primary.
Should be hidden members. Always prevent applications from seeing and querying delayed members.
do vote in elections for primary, if members[n].votes is set to 1.
Replica Set Arbiter
An arbiter does not have a copy of data set and cannot become a primary. Replica sets may have arbiters to add a vote in elections of for primary.
Arbiters always have exactly 1 election vote, and thus allow replica sets to have an uneven number of voting members without the overhead of an additional member that replicates data.
DEMO FOR REPLICATION
Make a replicaset with 5 members different kind of replica (e.g. Primary, Secondary, Hidden, Arbitrary and Priority 0).
Insert data and watch behavior for Delay and Arbiter member , and other Secondary members
Turn Down Primary and Invoke Elections .
Adjust Priority for Replica Set Member And Prevent Secondary from Becoming Primary
Configure Non-Voting Replica Set Member
Sharding
Sharding is a method for distributing data across multiple machines.
MongoDB uses sharding to support deployments with very large data sets and high throughput operations.
MongoDB supports horizontal scaling through sharding.
Sharding
Shard Keys
To distribute the documents in a collection, MongoDB partitions the collection using the shard key.
The shard key consists of an immutable field or fields that exist in every document in the target collection.
You choose the shard key when sharding a collection. The choice of shard key cannot be changed after sharding.
Shard Key
Chunks : - A contiguous range of shard key values within a particular shard. MongoDB splits chunks when they grow beyond the configured chunk size, which by default is 64 megabytes
The Perfect Shard Key
If you think about it, the perfect shard key would have the following characteristics:
All inserts, updates, and deletes would each be distributed uniformly across all of the shards in the cluster
All queries would be uniformly distributed across all of the shards in the cluster
All operations would only target the shards of interest: an update or delete would never be sent to a shard which didn't own the data being modified
Similarly, a query would never be sent to a shard which holds none of the data being queried
Hashed Vs Ranged Sharding
Hashed shard keys use a hashed index of a single field as the shard key to partition data across your sharded cluster.
Ranged-based sharding involves dividing data into contiguous ranges determined by the shard key values. In this model, documents with close shard key values are likely to be in the same chunk or shard.
By using a hashed index on X, the distribution of inserts is similar to the following:
Given a collection using a monotonically increasing value X
as
the shard key, using ranged sharding results in a distribution of
incoming inserts similar to the following:
Ranged sharding is most efficient when the shard key displays the following traits:
Large Shard Key Cardinality Low Shard Key Frequency Non-Monotonically Changing Shard Keys
Demo For Sharding
2 Shards Server As Replica Set
References
[1] MongoDB Officials Documentations https://docs.mongodb.com/v2.6
Thank you !