webinar slides: become a mongodb dba - scaling and sharding

78
Your host & some logistics I'm Jean-Jérôme from the Severalnines Team and I'm your host for today's webinar! Feel free to ask any questions in the Questions section of this application or via the Chat box. You can also contact me directly via the chat box or via email: [email protected] during or after the webinar.

Upload: severalnines

Post on 16-Apr-2017

152 views

Category:

Internet


2 download

TRANSCRIPT

Your host & some logisticsI'm Jean-Jérôme from the Severalnines Team and I'm

your host for today's webinar!

Feel free to ask any questions in the Questions section of this application or via the Chat box.

You can also contact me directly via the chat box or via email: [email protected] during or after the webinar.

About Severalnines and ClusterControl

What we do

Manage Scale

Monitor Deploy

ClusterControl Automation & Management

☐ Provisioning☐ Deploy a cluster in minutes☐ On-premises or in the cloud

(AWS)☐ Monitoring

☐ Systems view☐ 1sec resolution☐ DB / OS stats & performance

advisors☐ Configurable dashboards☐ Query Analyzer☐ Real-time / historical

☐ Management☐ Multi cluster/data-center☐ Automate repair/recovery☐ Database upgrades☐ Backups☐ Configuration management☐ Cloning ☐ One-click scaling

Supported Databases

Customers

Become a MongoDB DBA

Scaling and Sharding

Art van Scheppingen, Senior Support Engineer

Logistics

☐ Webinar is recorded☐ Replay available soon☐ Feel free to ask questions at any time☐ Use your control panel to contact us☐ Or email us as well: [email protected]

Agenda

☐ When scalability is needed☐ Scaling out reads with MongoDB☐ Sharding with MongoDB☐ Maintaining shards with MongoDB☐ ClusterControl and MongoDB scaling☐ NinesControl and MongoDB☐ Live Demo

When scalability is needed

MongoDB scaling and sharding

Scaling MongoDB

☐ Why do we need to scale?☐ Workload too large for the current system☐ Peak loads are saturating the system☐ Used disk space is nearing the capacity

☐ How do we scale?☐ Vertical☐ Horizontal

Vertical scaling

☐ Scale by adding more power☐ Adding faster CPU☐ More memory☐ More/faster storage disks

☐ You may outpace Moore’s law☐ Larger growth than new developments

☐ Costs may be very high☐ Faster CPUs and disks are more expensive

Horizontal scaling

☐ Scale by adding more machines☐ Split workload over multiple nodes☐ Many hands make light work

☐ Complexity☐ Horizontal scaling adds complexity☐ Orchestration systems are needed

Horizontal database scaling

☐ Read scaling☐ Offloading read operations to slave / secondary nodes☐ Caching layers

☐ Write scaling☐ Offloading write operations to another master / primary node☐ This is often referred to as sharding

Sharding types

☐ Functional sharding☐ Place the database / collection on a single node (or replicaSet) ☐ Uneven reads / writes☐ Can’t join data between databases / collections☐ Only vertical scaling

☐ Partitioned sharding☐ Spread the data of a database / collection over multiple nodes☐ Even reads / writes☐ Horizontal scaling

Scaling out reads with MongoDB

Typical MongoDB ReplicaSet

Read scaling considerations

☐ By default both read and write requests go to primary☐ MongoDB replication is asynchronous☐ Inconsistent data may be returned from a secondary

☐ Overloading secondaries may be dangerous for failover☐ Sharded environments☐ Data might have been migrated from one shard to another

Confidential

Asynchronous replication

☐ Replication happens, like MySQL replication, asynchronously☐ Eventual consistency: nodes will eventually contain the data☐ Not all nodes may be in the same state☐ Writeconcern to enforce data is secure

☐ Writeconcern☐ HA: wait for confirmation from secondary nodes☐ numeric, majority or <tag>

☐ Durability: Wait for write to journal

Confidential

Durability

Confidential

Durability

Confidential

Durability

Confidential

Eventual consistency

Confidential

Overloading secondaries

☐ Secondaries are used for HA☐ Contain a copy of the data☐ Take part in the (majority) voting process for a new primary

☐ Using more capacity than (number of secondaries - 1)☐ If one secondary fails, the load is distributed over remaining

nodes☐ Another node fails, due to the increased load

☐ No majority of secondaries remain☐ Primary gets demoted to a secondary☐ No more writes are allowed

Confidential

Read scaling considerations checklist

Offloading reads to secondaries

Setting the read preference

primary Always read from the primary (default)

primaryPreferred Always read from the primary, read from secondary if the primary is unavailable

secondary Always read from a secondary

secondaryPreferred Always read from a secondary, read from the primary if no secondary is available

nearest Always read from the node with the lowest network latency

Read preference primary (default)

Read preference primary (default)

Read preference primaryPreferred

Read preference primaryPreferred

Read preference secondary

Read preference secondary

Read preference secondaryPreferred

Read preference secondaryPreferred

Read preference nearest

Read preference nearest

Filtering nodes with tags

☐ Nodes in a replicaSet can be tagged☐ You can use these tags to filter nodes for read requests

db.getMongo().setReadPref(‘nearest’, [ { "dc": "2" } ] )

{ "_id" : "myrs", "version" : 2, "members" : [ { "_id" : 0, "host" : "host1:27017", "tags" : { "dc": "1", "rack": "e3" } }, { "_id" : 1, "host" : "host2:27017", "tags" : { "dc": "1", "rack": "b2" } }, { "_id" : 0, "host" : "host3:27017", "tags" : { "dc": "2", "rack": "q1" } } ]}

Filtering nodes with tags

Filtering nodes with tags

Scaling out limitations

☐ Adding more secondaries☐ Maximum of 50 nodes in a replicaSet☐ Maximum of 7 voting nodes in a replicaSet☐ Arbiter nodes also count in the 50 node limit

☐ No need for intermediate masters☐ MongoDB secondaries can replicate from secondaries

Sharding with MongoDB

MongoDB sharding components

☐ Shard router☐ Small router/proxy connecting to the configserver

☐ Configserver☐ A special replicaSet containing the shard administration

☐ Shards☐ ReplicaSet containing a piece of the data☐ Shards share nothing with other shards

Typical MongoDB sharding environment

Shard router: transparent sharding

Sharding meta data

☐ Sharding Meta data☐ Data gets partitioned by slicing it into ranges (chunks)☐ Chunks are defined by the shard key☐ Chunks are distributed evenly over all shards☐ MongoDB Balancer will balance chunks over shards

☐ If a chunk is too large, a new chunk is created☐ Shard router creates chunks on the fly☐ New chunks assigned to the shard with most space

☐ Sharding meta data is stored in the Configserver

Chunks and shards

Balancing shards

Shard key

☐ Placed on an indexed field or indexed compound field☐ Shard keys can’t be altered

☐ Only the _id field and shard key can be unique☐ The shard key is the most important factor in sharding

☐ It determines the data distribution☐ Influences the effectiveness of the shards

Shard key distribution

Shard key considerations: writing data

☐ Sequential writes☐ E.g. sequential identifiers, timestamps☐ Only one shard at the time will be written to☐ Use a hash shard key to randomly distribute

☐ Random writes☐ E.g. Username, UUID, date of birth☐ Will write over all shards

Shard key without hash function

Shard key with hash function

Shard key considerations: reading data

☐ Range queries covered by the shard key☐ Shard router will know which shard(s) to query☐ Very efficient if only one shard needs to be queried☐ Performance degrades the more shards need to be queried

☐ Range queries not covered by the shard key☐ Shard router will not know which shard(s) to query☐ Will require all shards to be queried

☐ Fields not covered by an index will perform a table scan on all shards

Maintaining shards with MongoDB

Backups

☐ Create a backup of a replicaSet: backup any node☐ Create a backup of a sharded cluster:

☐ Backup the Configserver replicaSet (meta data)☐ Backup each shard (one node per shard, pref. primary)

☐ Backups started at the same time, may not end at the same time

☐ Percona MongoDB Consistent backup tool☐ Backup orchestration tool☐ Will stream the oplog till all shards have been backed up☐ https://www.percona.com/blog/2016/07/25/mongodb-consistent-backups/

Monitoring a MongoDB sharded cluster

☐ Write capacity has increased largely☐ Look for the next bottleneck☐ Most obvious one:

☐ Range queries may touch every shard☐ Connections will increase

☐ Watch out for non-sharded collections☐ Will use up all available space, if written to extensively

Capacity planning

☐ Plan for scaling out☐ Monitor the number of chunks per shard☐ Watch the disk space per (node in the) shard☐ Adding a new shard will start balancing process☐ Increase in IO☐ Balancing large clusters may take forever

☐ Watch for new database / collections

Scaling out

☐ Adding new shards is easy☐ Deploy a new (empty) replicaSet☐ Add to cluster

mongo --eval 'sh.addShard("sh1/node1.domain.com:27018,node2.domain.com:27018,node3.domain.com:27018");'

Balancing shards

☐ MongoDB shard balancer☐ Runs in the background☐ Disable when making backups☐ Balancer can also be scheduled:

db.settings.update( { _id: "balancer" }, { $set: { activeWindow : { start : "02:00", stop : "04:00" } } }, { upsert: true })

Disabling the balancer on a collection

☐ Sometimes keeps balancing chunks back and forth☐ Some collections (archive) data needs to remain on its initial

shard☐ Disable the balancer per collection:

sh.disableBalancing("mydata.hugearchive")

Consistency

☐ Don’t read from secondaries☐ Balancer may be migrating chunks between shards☐ Inconsistent data between primary and secondary

☐ Don’t connect to shards directly☐ Data on a shard may actually reside on another shard☐ Shard router will hide this

ClusterControl and MongoDB scaling

MongoDB replicaSet

☐ Deploy a MongoDB replicaSet☐ Scale the replicaSet by adding secondaries☐ Add arbiters for voting☐ Make reliable backups

MongoDB sharded cluster

☐ Deploy a fully sharded cluster☐ Deploy shard routers (mongos)☐ Deploy Configservers☐ Deploy shards

☐ Scale by adding shards☐ Convert a replicaSet into a sharded cluster (1.3.3)☐ Make cluster wide consistent backups

☐ Percona Consistent MongoDB backup

NinesControl and MongoDB

NinesControl

NinesControl deployment

NinesControl deployment

NinesControl deployment

NinesControl deployment

NinesControl monitoring

NinesControl monitoring

ClusterControl: live demo

Demo

Q & A

Thank you!

☐ NinesControl☐ https://ninescontrol.com/

☐ Severalnines Blog☐ www.severalnines.com/blog-categories/mongodb

☐ ClusterControl ☐ www.severalnines.com/product/clustercontrol

☐ Contact: [email protected]