webinar: operational best practices

Senior Solutions Architect, 10gen

Asya Kamsky

#MongoDB

Operational Best Practices

Operational Best Practices Asya Kamsky

Best Practices == More Value

How to get more sleep while your MongoDB cluster hums along

The Agenda

•  Roles and responsibilities

•  Schema design and application performance

•  Hardware

•  Replication

•  Sharding

•  Monitoring

Roles and Responsibilities

Application Data needs

Schema Design

Read and Write

Patterns

Indexing Strategy

Hardware: RAM, CPU,

disk...

Network, Firewalls, Security

Application Data needs

Schema Design

Read and Write Patterns

Indexing Strategy

Hardware: RAM, CPU,

disk...

Network, Firewalls, Security

Backups

Maintenance

Upgrades

MONITORING

Roles and Responsibilities Application Developer

Data Architect

DBA System Admin

Network Admin

Schema Design and Application Performance

In MongoDB correct schema design is essential for optimal application performance.

DATA != SCHEMA

Schema and Performance

Multiple types of indexes supported.

Indexing is essential

•  Monitoring •  Measuring •  Benchmarking •  Optimizing

Understanding actual performance

•  Logs •  Query plan •  Application •  Ad-hoc testing

Hardware

•  Memory

•  Storage

•  CPU - speed

•  CPU - number of cores

Impact on performance in that order!

Replica Sets

Secondary Secondary

Primary

Client ApplicationDriver

Replica Sets and Application

Node 1Secondary

Node 2Secondary

Node 3Primary

Replication

Heartbeat

ReplicationReplica Set – HA

Node 1Secondary

Node 2Secondary

Node 3

Heartbeat

Primary Election

Replica Set – Failure

Node 1Secondary

Node 2Primary

Node 3

Replication

Heartbeat

Replica Set – Failover

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Recovery

Replication

Replica Set – Recovery

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Secondary

Replication

Replica Set – Reestablished

Replica Sets

•  Primary purpose: –  High Availability with automatic failover –  Disaster Recovery –  No-down-time maintenance –  No application changes on reconfiguration –  Extra copies of data for "special" read workloads

•  Full benefit achieved with advance planning

Replica Sets

•  Full benefit achieved with advance planning

•  Determine your SLA/HA requirements •  Determine your DR requirements •  Understand impact of node, network, DC failure •  Understand all available RS features

priority scores, hidden, delayed, tags •  Monitor and proactively remedy potential problems •  Practice recovery from disastrous failure

Replica Sets

•  Best Practices for Configuration –  Odd number of voting replica members –  Size the oplog appropriately for high volume loads –  Use multiple Data Centers/Availability Zones –  Use DNS names for node configuration –  Add hidden delayed-replication member as "insurance" –  All replica set nodes should have same capacity

•  Operation –  Upgrade secondaries first (primary last) –  Maintenance on secondaries first (primary last) –  Use 'rs.stepDown()' command

Sharded Clusters

Node 1SecondaryConfigServer

Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Sharding

•  Keys to successful sharding: –  Pick a good shard key –  Make config servers resilient –  Shard before you "have to"

•  Good shard key is essential to achieving scaling

Sharded Clusters

•  Good shard key is essential to achieving scaling

•  Distributes your writes across all shards •  Allows majority of reads to be "targeted" (not scatter-

gather) •  Exists in every document •  Has sufficiently high cardinality •  Allows you to take advantage of advanced features - tag aware balancing

•  Config Servers –  Three must be available to automatically balance data –  All three must be "in sync"

•  if one becomes unavailable others go read-only –  At least one must be available to avoid disaster

•  without information inside config server it's not possible to determine which shards contain which ranges of data!

•  Must stop balancing during backup

Sharded Clusters

•  Shard before you "have to" –  Balancing data is intensive process –  If existing cluster is near full capacity balancing may impact

response time of application –  Planning to shard well in advance gives more time

•  to provision new hardware •  to select a good shard key •  to understand advanced sharding features (tagging)

Sharded Clusters

•  Other best practices –  Three config servers –  Each shard is a replica set –  Test what you run

•  use the same topology in QA as in production –  Monitor

•  RAM •  disk I/O •  total storage •  MongoDB throughput

Sharded Clusters

Monitoring

• Multiple CLI and internal status commands •  mongostat; mongotop; db.serverStatus()

• MMS

•  Plug-ins for munin, Nagios, cacti, etc.

•  Integration via SNMP to other tools

MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts

•  Charts, custom dashboards and automated alerting

•  Tracks 100+ metrics – performance, resource utilization, availability and response times

•  10,000+ users

MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts

A Picture Speaks a Thousand Words

Symptoms

High Use CPU Similar Query Pattern

Diagnostics - iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00

Monitoring

• mongostat

Monitoring

• mongotop

Monitoring Best Practices

•  Monitor Logs –  Alert, escalate –  Correlate

•  Disk –  Monitor

•  Instrument/Monitor App (including logs!)

•  Know your application and application (write) characteristics

Monitoring Best Practices

•  Performance test/analyze system behavior

•  Load test before deployment

•  Selectively use database profiling during testing

•  Alert on abnormal states

•  High CPU is a sign of poorly indexed query

Best Practices Summary

Best Practices

•  Pre-deployment –  Learn –  Plan –  Prototype/Benchmark –  Execute

•  During deployment –  Monitor –  Continue planning –  Evolve

System provisioning

•  Capacity

•  Performance

•  Scale

•  Configuration

•  Review

•  Alert

•  Rotate and collect (per cluster)

Query/Index Analysis

•  Database Profiler

•  Run explain periodically (sampled)

•  Instrument code, generate metrics

•  Look for similar patterns to find root 'cause

Hardware Configuration

•  Pay attention to disk configurations

•  Load testing will find some misconfigurations

•  MongoDB depends on the OS a lot

Plan/Test Rollouts

•  Rolling upgrade for Replica Set

•  Generate indexes on secondaries first

•  Name services, use redirection

More References

•  Please take a look at http://docs.mongodb.org

•  Ask questions on mongodb-user group

•  Use MMS or historic monitoring –  Watch for trends –  Create alerts –  Forecast capacity for provisioning

•  Utilize all available resources –  10gen offers paid public and on-site training & free web-based

classes –  consulting services –  pre-production and production support

Senior Solutions Architect, 10gen

Asya Kamsky

#MongoSV

Thank You

webinar: operational best practices

replica set failure

sharded clusters shard

replica set failover

node conguration

good shard key

impact of node

sharded clusters cong

sharded clusters keys

Technology

ims best operational practices

corrective maintenance practices and operational

outsourcing practices and operational performance …

avamar 5 operational best practices

landfill operational practices manual

etf operational considerations and developments webinar

operational best practices guide

dnssec operational practices for authoritative name … ·...

webinar: content best practices

impact of jit practices and lean practices on operational...

performance measurement practices and operational

running a webinar - best practices

webinar: deployment best practices

webinar 12: best practices

in-stream best practices webinar

open practices telp-sig webinar

operational best practices workshop -...

webinar planning & execution best practices

data center operational efficiency best practices - ibm...

skills management best practices webinar