webinar: operational best practices
DESCRIPTION
This webinar will cover best practices around dev/ops and general operations for those already familiar with basics of MongoDB. Topics will include team roles around data model design, monitoring, hardware configurations, replication and horizontal scaling.TRANSCRIPT
Senior Solutions Architect, 10gen
Asya Kamsky
#MongoDB
Operational Best Practices
Operational Best Practices Asya Kamsky
Best Practices == More Value
How to get more sleep while your MongoDB cluster hums along
The Agenda
• Roles and responsibilities
• Schema design and application performance
• Hardware
• Replication
• Sharding
• Monitoring
Operational Best Practices Asya Kamsky
Roles and Responsibilities
Application Data needs
Schema Design
Read and Write
Patterns
Indexing Strategy
Hardware: RAM, CPU,
disk...
Network, Firewalls, Security
Roles and Responsibilities
Operational Best Practices Asya Kamsky
Application Data needs
Schema Design
Read and Write Patterns
Indexing Strategy
Hardware: RAM, CPU,
disk...
Network, Firewalls, Security
Backups
Maintenance
Upgrades
Roles and Responsibilities
Operational Best Practices Asya Kamsky
MONITORING
Roles and Responsibilities Application Developer
Data Architect
DBA System Admin
Network Admin
Operational Best Practices Asya Kamsky
Schema Design and Application Performance
In MongoDB correct schema design is essential for optimal application performance.
DATA != SCHEMA
Schema and Performance
Operational Best Practices Asya Kamsky
Multiple types of indexes supported.
Indexing is essential
Schema and Performance
Operational Best Practices Asya Kamsky
• Monitoring • Measuring • Benchmarking • Optimizing
Understanding actual performance
Schema and Performance
Operational Best Practices Asya Kamsky
• Logs • Query plan • Application • Ad-hoc testing
Hardware
Hardware
• Memory
• Storage
• CPU - speed
• CPU - number of cores
Impact on performance in that order!
Operational Best Practices Asya Kamsky
Replica Sets
Secondary Secondary
Primary
Client ApplicationDriver
Write
Read
Replica Sets and Application
Node 1Secondary
Node 2Secondary
Node 3Primary
Replication
Heartbeat
ReplicationReplica Set – HA
Node 1Secondary
Node 2Secondary
Node 3
Heartbeat
Primary Election
Replica Set – Failure
Node 1Secondary
Node 2Primary
Node 3
Replication
Heartbeat
Replica Set – Failover
Node 1Secondary
Node 2Primary
Replication
Heartbeat
Node 3Recovery
Replication
Replica Set – Recovery
Node 1Secondary
Node 2Primary
Replication
Heartbeat
Node 3Secondary
Replication
Replica Set – Reestablished
Replica Sets
• Primary purpose: – High Availability with automatic failover – Disaster Recovery – No-down-time maintenance – No application changes on reconfiguration – Extra copies of data for "special" read workloads
• Full benefit achieved with advance planning
Operational Best Practices Asya Kamsky
Replica Sets
• Full benefit achieved with advance planning
Operational Best Practices Asya Kamsky
• Determine your SLA/HA requirements • Determine your DR requirements • Understand impact of node, network, DC failure • Understand all available RS features
priority scores, hidden, delayed, tags • Monitor and proactively remedy potential problems • Practice recovery from disastrous failure
Replica Sets
• Best Practices for Configuration – Odd number of voting replica members – Size the oplog appropriately for high volume loads – Use multiple Data Centers/Availability Zones – Use DNS names for node configuration – Add hidden delayed-replication member as "insurance" – All replica set nodes should have same capacity
• Operation – Upgrade secondaries first (primary last) – Maintenance on secondaries first (primary last) – Use 'rs.stepDown()' command
Operational Best Practices Asya Kamsky
Sharded Clusters
Node 1SecondaryConfigServer
Node 1SecondaryConfigServer
Node 1SecondaryConfigServer
Shard Shard Shard
Mongos
App Server
Mongos
App Server
Mongos
App Server
Sharding
• Keys to successful sharding: – Pick a good shard key – Make config servers resilient – Shard before you "have to"
• Good shard key is essential to achieving scaling
Operational Best Practices Asya Kamsky
Sharded Clusters
Sharded Clusters
• Good shard key is essential to achieving scaling
Operational Best Practices Asya Kamsky
• Distributes your writes across all shards • Allows majority of reads to be "targeted" (not scatter-
gather) • Exists in every document • Has sufficiently high cardinality • Allows you to take advantage of advanced features - tag aware balancing
• Config Servers – Three must be available to automatically balance data – All three must be "in sync"
• if one becomes unavailable others go read-only – At least one must be available to avoid disaster
• without information inside config server it's not possible to determine which shards contain which ranges of data!
• Must stop balancing during backup
Sharded Clusters
Operational Best Practices Asya Kamsky
• Shard before you "have to" – Balancing data is intensive process – If existing cluster is near full capacity balancing may impact
response time of application – Planning to shard well in advance gives more time
• to provision new hardware • to select a good shard key • to understand advanced sharding features (tagging)
Sharded Clusters
Operational Best Practices Asya Kamsky
• Other best practices – Three config servers – Each shard is a replica set – Test what you run
• use the same topology in QA as in production – Monitor
• RAM • disk I/O • total storage • MongoDB throughput
Sharded Clusters
Operational Best Practices Asya Kamsky
Monitoring
Monitoring
• Multiple CLI and internal status commands • mongostat; mongotop; db.serverStatus()
• MMS
• Plug-ins for munin, Nagios, cacti, etc.
• Integration via SNMP to other tools
Operational Best Practices Asya Kamsky
MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts
• Charts, custom dashboards and automated alerting
• Tracks 100+ metrics – performance, resource utilization, availability and response times
• 10,000+ users
MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts
A Picture Speaks a Thousand Words
Operational Best Practices Asya Kamsky
Symptoms
High Use CPU Similar Query Pattern
Operational Best Practices Asya Kamsky
Diagnostics - iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00
Operational Best Practices Asya Kamsky
Monitoring
• mongostat
Operational Best Practices Asya Kamsky
Monitoring
• mongotop
Operational Best Practices Asya Kamsky
Monitoring Best Practices
• Monitor Logs – Alert, escalate – Correlate
• Disk – Monitor
• Instrument/Monitor App (including logs!)
• Know your application and application (write) characteristics
Operational Best Practices Asya Kamsky
Monitoring Best Practices
• Performance test/analyze system behavior
• Load test before deployment
• Selectively use database profiling during testing
• Alert on abnormal states
• High CPU is a sign of poorly indexed query
Operational Best Practices Asya Kamsky
Best Practices Summary
Best Practices
• Pre-deployment – Learn – Plan – Prototype/Benchmark – Execute
• During deployment – Monitor – Continue planning – Evolve
Operational Best Practices Asya Kamsky
System provisioning
• Capacity
• Performance
• Scale
• Configuration
Operational Best Practices Asya Kamsky
Logs
• Review
• Alert
• Rotate and collect (per cluster)
Operational Best Practices Asya Kamsky
Query/Index Analysis
• Database Profiler
• Run explain periodically (sampled)
• Instrument code, generate metrics
• Look for similar patterns to find root 'cause
Operational Best Practices Asya Kamsky
Hardware Configuration
• Pay attention to disk configurations
• Load testing will find some misconfigurations
• MongoDB depends on the OS a lot
Operational Best Practices Asya Kamsky
Plan/Test Rollouts
• Rolling upgrade for Replica Set
• Generate indexes on secondaries first
• Name services, use redirection
Operational Best Practices Asya Kamsky
More References
• Please take a look at http://docs.mongodb.org
• Ask questions on mongodb-user group
• Use MMS or historic monitoring – Watch for trends – Create alerts – Forecast capacity for provisioning
• Utilize all available resources – 10gen offers paid public and on-site training & free web-based
classes – consulting services – pre-production and production support
Operational Best Practices Asya Kamsky
Senior Solutions Architect, 10gen
Asya Kamsky
#MongoSV
Thank You