mobile 3: launch like a boss!
TRANSCRIPT
- 1. Deploy Like a Boss Building a Mobile App with MongoDB Part 3
- 2. 2 Deploy with Joy!
- 3. 3
- 4. 4
- 5. 5 Production Checklist Proper Infrastructure Proper Configuration Proper Monitoring Emergency Procedures
- 6. 6 Infrastructure Sizing RAM CPU Disk Size I/O Bandwidth Availability
- 7. 7 Sizing Indexes need to be in RAM Working set needs to be in RAM I/O Bandwidth - write load - Index updates - Working set migration { _id: ObjectId(), tour: UUID, user: UUID, name: "Doug's Dogs", desc: "The best hot-dog", clues: [ "Hungry for a Coney Island?", "Ask for Dr. Frankenfurter", "Look for the hot dog stand" ] "geometry": { "type": "Point", "coordinates": [125.6, 10.1] } }
- 8. 11 Load Testing
- 9. 12 Load Testing Test it like you use it, benchmarks dont count
- 10. 13 Load Testing Test it like you use it, benchmarks dont count Test to failure
- 11. 14 Load Testing Test it like you use it, benchmarks dont count Test to failure Instrument your code!
- 12. 15 Load Testing Test it like you use it, benchmarks dont count Test to failure Instrument your code! https://github.com/breinero/Firehose https://github.com/ParsePlatform/flashback
- 13. 16 Load Testing Test it like you use it, benchmarks dont count Test to failure Instrument your code! Theres me
- 14. 17 Growth 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 Saturation Warn Load 1K Ops / Second time
- 15. 18 Growth 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 Saturation Warn Load Memory
- 16. 19 Growth 0 2 4 6 8 10 12 1 2 3 4 5 6 7 8 9 10 Saturation Warn Load Input Output
- 17. 20
- 18. 21 Monitoring Baseline MongoDB Management Service (MMS) MongoDB Ops Manager Nagios, Zenoss, Detailed Query Specific mongotop db.currentOp() Query Profiler mtools
- 19. 22 Fosrensics 2014-08-08T21:15:25.181-0500 [conn1026] getmore claimsPoc.claims cursorid:100012502307 ntoreturn:0 keyUpdates:0 numYields:1406953 locks(micros) r:11887558422 nreturned:289 reslen:4208149 28795759ms 2014-08-07T15:31:51.714-0500 [conn7] command claimsPoc.$cmd command: createIndexes { createIndexes: "claims", indexes: [ { key: { Claims.ICN: 1.0 }, name: "Claims.ICN_1" } ] } keyUpdates:0 numYields:0 locks(micros) r:14476 w:25176930351 reslen:113 25176955ms
- 20. 23 Logging
- 21. 24 Logging Save and Rotate Dont use --quiet --logpath != --dbpath Use component verbosity for debugging
- 22. 25 Security
- 23. 26 Security Firewall Bind ip Encrypt Networks Enable Access Control Dont enable REST interface Auditing Limit Exposure and use Principal of Least Privileges
- 24. 27 Tuning Best Practices Disable Transparent hugepages NTP to synchronize time Set ulimits Use XFS or Ext4 Dont use NFS Disable NUMA Have swap Read Production Notes Tunables Set IO Scheduler NOOP Adjust readaheads ( MMapV1 ) Avoid cgroups SE Linux (?) RAID
- 25. 28 Availability http://avstop.com/ac/flighttrainghandbook/imagel4b.jpg
- 26. 29 Availability S S DC1 DC2 P Avoid Critical Data Centers
- 27. 30 Availability P S DC1 DC2 S DC3
- 28. 31 Availability P S DC1 DC2 S AWS
- 29. 32 Availability P S DC1 DC2 Arbiter AWS
- 30. 33 Availability P DC1 Arbiter AWS S DC2 Down for maintenance
- 31. 34 Emergency Procedures https://spinoff.nasa.gov/spinoff2002/images/070.jpg
- 32. 35 Emergency Procedures https://spinoff.nasa.gov/spinoff2002/images/070.jpg Backup and Recovery File System Snapshot MMS Cloud Ops Manager Mongodump
- 33. 36 Backups and Recovery https://spinoff.nasa.gov/spinoff2002/images/070.jpg PERFORM DRILLS OFTEN AND ROUTINELY
- 34. 37 Emergency Procedures https://spinoff.nasa.gov/spinoff2002/images/070.jpg Document your Procedures Include ETAs Follow procedures in docs.mongodb.org
- 35. 38 Production Ready Architecture L.B.
- 36. 39 Production Ready Architecture L.B. Unindexed queries
- 37. 40 Production Ready Architecture L.B. Unindexed queries Leads to collection scans
- 38. 41 Production Ready Architecture L.B. Unindexed queries Leads to collection scans Results in high latencies
- 39. 42 Classic Failure Scenario L.B. Unindexed queries Leads to collection scans Results in high latenciesCauses memory exhaustion
- 40. 43 Production Ready Architecture L.B. Unindexed queries Leads to collection scans Results in high latenciesCauses memory exhaustion CASCADING FAILURE
- 41. 44 Circuit Breaker Trigger Conditions Latency stats.getMean() >= max OpsPerSecond stats.getN() >= max ConcurrentOperations stats.getN()*stats.getMean() >= max
- 42. 45 Circuit Breaker Trigger Conditions Latency stats.getMean() >= max OpsPerSecond stats.getN() >= max ConcurrentOperations stats.getN()*stats.getMean() >= max https://github.com/breinero/Firehose
- 43. 46 Production Ready Architecture L.B.
- 44. 47 Client Side Dont use ensureIndex() in application Look out for connection bombs --maxConnect DO use operation timeouts DONT cause socket timeouts Lower keepalives Avoid retry bombs
- 45. 48 Requirements & Specs Make a DevOps Contract Database Access Requirements Database Access Fulfillment Specification Cluster Configuration Monitoring and Alerting Specification
- 46. 49 Monitoring Opcounters Memory Page Faults Queues Replication Lag Oplog Window Background Flush Average Disk space
- 47. Thanks! { name: Bryan Reinero, title: Developer Advocate, twitter: @blimpyacht, code: github.com/breinero email: [email protected] }