webinar: operational best practices

50
Senior Solutions Architect, 10gen Asya Kamsky #MongoDB Operational Best Practices

Upload: mongodb

Post on 25-May-2015

2.849 views

Category:

Technology


0 download

DESCRIPTION

This webinar will cover best practices around dev/ops and general operations for those already familiar with basics of MongoDB. Topics will include team roles around data model design, monitoring, hardware configurations, replication and horizontal scaling.

TRANSCRIPT

Page 1: Webinar: Operational Best Practices

Senior Solutions Architect, 10gen

Asya Kamsky

#MongoDB

Operational Best Practices

Page 2: Webinar: Operational Best Practices

Operational Best Practices Asya Kamsky

Best Practices == More Value

How to get more sleep while your MongoDB cluster hums along

Page 3: Webinar: Operational Best Practices

The Agenda

•  Roles and responsibilities

•  Schema design and application performance

•  Hardware

•  Replication

•  Sharding

•  Monitoring

Operational Best Practices Asya Kamsky

Page 4: Webinar: Operational Best Practices

Roles and Responsibilities

Page 5: Webinar: Operational Best Practices

Application Data needs

Schema Design

Read and Write

Patterns

Indexing Strategy

Hardware: RAM, CPU,

disk...

Network, Firewalls, Security

Roles and Responsibilities

Operational Best Practices Asya Kamsky

Page 6: Webinar: Operational Best Practices

Application Data needs

Schema Design

Read and Write Patterns

Indexing Strategy

Hardware: RAM, CPU,

disk...

Network, Firewalls, Security

Backups

Maintenance

Upgrades

Roles and Responsibilities

Operational Best Practices Asya Kamsky

MONITORING

Page 7: Webinar: Operational Best Practices

Roles and Responsibilities Application Developer

Data Architect

DBA System Admin

Network Admin

Operational Best Practices Asya Kamsky

Page 8: Webinar: Operational Best Practices

Schema Design and Application Performance

Page 9: Webinar: Operational Best Practices

In MongoDB correct schema design is essential for optimal application performance.

DATA != SCHEMA

Schema and Performance

Operational Best Practices Asya Kamsky

Page 10: Webinar: Operational Best Practices

Multiple types of indexes supported.

Indexing is essential

Schema and Performance

Operational Best Practices Asya Kamsky

Page 11: Webinar: Operational Best Practices

•  Monitoring •  Measuring •  Benchmarking •  Optimizing

Understanding actual performance

Schema and Performance

Operational Best Practices Asya Kamsky

•  Logs •  Query plan •  Application •  Ad-hoc testing

Page 12: Webinar: Operational Best Practices

Hardware

Page 13: Webinar: Operational Best Practices

Hardware

•  Memory

•  Storage

•  CPU - speed

•  CPU - number of cores

Impact on performance in that order!

Operational Best Practices Asya Kamsky

Page 14: Webinar: Operational Best Practices

Replica Sets

Page 15: Webinar: Operational Best Practices

Secondary Secondary

Primary

Client ApplicationDriver

Write

Read

Replica Sets and Application

Page 16: Webinar: Operational Best Practices

Node 1Secondary

Node 2Secondary

Node 3Primary

Replication

Heartbeat

ReplicationReplica Set – HA

Page 17: Webinar: Operational Best Practices

Node 1Secondary

Node 2Secondary

Node 3

Heartbeat

Primary Election

Replica Set – Failure

Page 18: Webinar: Operational Best Practices

Node 1Secondary

Node 2Primary

Node 3

Replication

Heartbeat

Replica Set – Failover

Page 19: Webinar: Operational Best Practices

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Recovery

Replication

Replica Set – Recovery

Page 20: Webinar: Operational Best Practices

Node 1Secondary

Node 2Primary

Replication

Heartbeat

Node 3Secondary

Replication

Replica Set – Reestablished

Page 21: Webinar: Operational Best Practices

Replica Sets

•  Primary purpose: –  High Availability with automatic failover –  Disaster Recovery –  No-down-time maintenance –  No application changes on reconfiguration –  Extra copies of data for "special" read workloads

•  Full benefit achieved with advance planning

Operational Best Practices Asya Kamsky

Page 22: Webinar: Operational Best Practices

Replica Sets

•  Full benefit achieved with advance planning

Operational Best Practices Asya Kamsky

•  Determine your SLA/HA requirements •  Determine your DR requirements •  Understand impact of node, network, DC failure •  Understand all available RS features

priority scores, hidden, delayed, tags •  Monitor and proactively remedy potential problems •  Practice recovery from disastrous failure

Page 23: Webinar: Operational Best Practices

Replica Sets

•  Best Practices for Configuration –  Odd number of voting replica members –  Size the oplog appropriately for high volume loads –  Use multiple Data Centers/Availability Zones –  Use DNS names for node configuration –  Add hidden delayed-replication member as "insurance" –  All replica set nodes should have same capacity

•  Operation –  Upgrade secondaries first (primary last) –  Maintenance on secondaries first (primary last) –  Use 'rs.stepDown()' command

Operational Best Practices Asya Kamsky

Page 24: Webinar: Operational Best Practices

Sharded Clusters

Page 25: Webinar: Operational Best Practices

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Node 1SecondaryConfigServer

Shard Shard Shard

Mongos

App Server

Mongos

App Server

Mongos

App Server

Sharding

Page 26: Webinar: Operational Best Practices

•  Keys to successful sharding: –  Pick a good shard key –  Make config servers resilient –  Shard before you "have to"

•  Good shard key is essential to achieving scaling

Operational Best Practices Asya Kamsky

Sharded Clusters

Page 27: Webinar: Operational Best Practices

Sharded Clusters

•  Good shard key is essential to achieving scaling

Operational Best Practices Asya Kamsky

•  Distributes your writes across all shards •  Allows majority of reads to be "targeted" (not scatter-

gather) •  Exists in every document •  Has sufficiently high cardinality •  Allows you to take advantage of advanced features - tag aware balancing

Page 28: Webinar: Operational Best Practices

•  Config Servers –  Three must be available to automatically balance data –  All three must be "in sync"

•  if one becomes unavailable others go read-only –  At least one must be available to avoid disaster

•  without information inside config server it's not possible to determine which shards contain which ranges of data!

•  Must stop balancing during backup

Sharded Clusters

Operational Best Practices Asya Kamsky

Page 29: Webinar: Operational Best Practices

•  Shard before you "have to" –  Balancing data is intensive process –  If existing cluster is near full capacity balancing may impact

response time of application –  Planning to shard well in advance gives more time

•  to provision new hardware •  to select a good shard key •  to understand advanced sharding features (tagging)

Sharded Clusters

Operational Best Practices Asya Kamsky

Page 30: Webinar: Operational Best Practices

•  Other best practices –  Three config servers –  Each shard is a replica set –  Test what you run

•  use the same topology in QA as in production –  Monitor

•  RAM •  disk I/O •  total storage •  MongoDB throughput

Sharded Clusters

Operational Best Practices Asya Kamsky

Page 31: Webinar: Operational Best Practices

Monitoring

Page 32: Webinar: Operational Best Practices

Monitoring

• Multiple CLI and internal status commands •  mongostat; mongotop; db.serverStatus()

• MMS

•  Plug-ins for munin, Nagios, cacti, etc.

•  Integration via SNMP to other tools

Operational Best Practices Asya Kamsky

Page 33: Webinar: Operational Best Practices

MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts

Page 34: Webinar: Operational Best Practices

•  Charts, custom dashboards and automated alerting

•  Tracks 100+ metrics – performance, resource utilization, availability and response times

•  10,000+ users

MongoDB Monitoring Service (MMS) Free, cloud-based service for monitoring and alerts

Page 35: Webinar: Operational Best Practices

A Picture Speaks a Thousand Words

Operational Best Practices Asya Kamsky

Page 36: Webinar: Operational Best Practices

Symptoms

High Use CPU Similar Query Pattern

Operational Best Practices Asya Kamsky

Page 37: Webinar: Operational Best Practices

Diagnostics - iostat Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sdp 0.00 0.00 0.50 0.00 27.86 0.00 56.00 149.58 20320.00 2010.00 100.00

Operational Best Practices Asya Kamsky

Page 38: Webinar: Operational Best Practices

Monitoring

• mongostat

Operational Best Practices Asya Kamsky

Page 39: Webinar: Operational Best Practices

Monitoring

• mongotop

Operational Best Practices Asya Kamsky

Page 40: Webinar: Operational Best Practices

Monitoring Best Practices

•  Monitor Logs –  Alert, escalate –  Correlate

•  Disk –  Monitor

•  Instrument/Monitor App (including logs!)

•  Know your application and application (write) characteristics

Operational Best Practices Asya Kamsky

Page 41: Webinar: Operational Best Practices

Monitoring Best Practices

•  Performance test/analyze system behavior

•  Load test before deployment

•  Selectively use database profiling during testing

•  Alert on abnormal states

•  High CPU is a sign of poorly indexed query

Operational Best Practices Asya Kamsky

Page 42: Webinar: Operational Best Practices

Best Practices Summary

Page 43: Webinar: Operational Best Practices

Best Practices

•  Pre-deployment –  Learn –  Plan –  Prototype/Benchmark –  Execute

•  During deployment –  Monitor –  Continue planning –  Evolve

Operational Best Practices Asya Kamsky

Page 44: Webinar: Operational Best Practices

System provisioning

•  Capacity

•  Performance

•  Scale

•  Configuration

Operational Best Practices Asya Kamsky

Page 45: Webinar: Operational Best Practices

Logs

•  Review

•  Alert

•  Rotate and collect (per cluster)

Operational Best Practices Asya Kamsky

Page 46: Webinar: Operational Best Practices

Query/Index Analysis

•  Database Profiler

•  Run explain periodically (sampled)

•  Instrument code, generate metrics

•  Look for similar patterns to find root 'cause

Operational Best Practices Asya Kamsky

Page 47: Webinar: Operational Best Practices

Hardware Configuration

•  Pay attention to disk configurations

•  Load testing will find some misconfigurations

•  MongoDB depends on the OS a lot

Operational Best Practices Asya Kamsky

Page 48: Webinar: Operational Best Practices

Plan/Test Rollouts

•  Rolling upgrade for Replica Set

•  Generate indexes on secondaries first

•  Name services, use redirection

Operational Best Practices Asya Kamsky

Page 49: Webinar: Operational Best Practices

More References

•  Please take a look at http://docs.mongodb.org

•  Ask questions on mongodb-user group

•  Use MMS or historic monitoring –  Watch for trends –  Create alerts –  Forecast capacity for provisioning

•  Utilize all available resources –  10gen offers paid public and on-site training & free web-based

classes –  consulting services –  pre-production and production support

Operational Best Practices Asya Kamsky

Page 50: Webinar: Operational Best Practices

Senior Solutions Architect, 10gen

Asya Kamsky

#MongoSV

Thank You