production mongodb in the cloud

Production MongoDB

in the CloudFrom Essentials to Corner Cases

Who are we?

Mike Hobbs & Bridget Kromhout

Social Commerce&

Brand Interest Graph Analytics

http://www.8thbridge.com/

Why MongoDB?

● Scalable, high-performance, open source● Dynamic schemas for unstructured data● Query language close to SQL in power● "Eventually consistent" is hard to program right

Our configuration12-node cluster (4 shards x 3 replica sets)Several other non-sharded replica setsDesired webapp response time is < 10ms

Total data size: 110 GBTotal index size: 28 GBLargest collection: 49 GBLargest index: 8.1 GB

EC2: EBS, instance size, replicationMongoDB: right for only some data sets

Memory & iowait

Working set needs to fit in memory

● Indexes● Frequently accessed records

Avoid swapping!!!EBS latency in EC2 is an issue.

FragmentationFragmentation steals from your most precious resource by reserving memory that is not used.

Run a compaction when your storageSize significantly exceeds your data sizemongos> db.widgets.stats()

..."size" : 5097988,"storageSize" : 22507520,

Padding can reduce fragmentation and I/Odb.widgets.insert({widg_id: "72120", padding: "XXXX...XXX"})db.widgets.update({widg_id: "72120"}, { $unset: {padding: ""}, $set: {desc: "Grout remover", price: "13.39", instock: true} })

Replica sets

"optime" : { "t" : 1365165841000 , "i" : 1 }, "optimeDate" : { "$date" : "Fri Apr 5 07:44:01 2013" },

test-3-1.yourdomain

test-3-2.yourdomain

test-3-3.yourdomain

test-3-1.yourdomain

test

Elections

08:52:06 [rsMgr] can't see a majority of the set, relinquishing primary 08:52:06 [rsMgr] replSet relinquishing primary state 08:52:06 [rsMgr] replSet SECONDARY 08:52:12 [rsMgr] replSet can't see a majority, will not try to elect self

Primary always determined by an election.

2-member replSet without an arbiter: if the secondary goes offline, the primary will step down:

Priorities can rig elections.

Ensure availability of an odd number of voting members.

Manual primary changes

No "become primary now" command. Manual stepdowns with recusal timeout are best option.

test-1:PRIMARY> rs.stepDown(300)Wed Apr 3 11:45:36 DBClientCursor::init call() failedWed Apr 3 11:45:36 query failed : admin.$cmd { replSetStepDown: 300.0 } to: 127.0.0.1:27017Wed Apr 3 11:45:36 Error: error doing query: failed src/mongo/shell/collection.js:155Wed Apr 3 11:45:36 trying reconnect to 127.0.0.1:27017Wed Apr 3 11:45:36 reconnect 127.0.0.1:27017 oktest-1:SECONDARY>

This triggers an election.

(Obviously, make sure your preferred candidate(s) can win.)

States: down (initializing), startup2, secondary, primary

replSet back to standalone? No. Test server: replicaset of 1, shard of 1. removed --replSet but shard configuration needed manual update:db.shards.update({host:"testreplset/test.domain.net"}, {$set:{host:"test.domain.net"}})

UpdatedExisting values no longer returned by mongos, butvisible when connected to mongod:> db.schedule.update({_id:...}, {$set:{lock:true}}, false, true); db.runCommand("getlasterror"){ "updatedExisting" : true, "n" : 1, "connectionId" : 73, "err" : null, "ok" : 1}

Solution: re-adding --replSet to the mongod startup line and reverting shard configs. (Bug open with 10gen.)

ShardingCan increase parallelization of CPU & I/OCarefully choose a shard key (nontrivial to change)Must run config servers & mongosDoesn't ensure high availabilityDoesn't help if you're already out of memory

256GB collection max for initial sharding

Rebalancing data across shardsQueries block while servers negotiate final hand-off.

Updating indexes after hand-off can be slow.

Best run off-peakmongos> use configswitched to db configmongos> db.settings.find(){ "_id" : "balancer", "activeWindow" : { "start" : "23:00", "stop" : "6:00" }}

Mongos & replSet primary changesApplication-level errors talking to mongos after an election:

pymongo.errors.AutoReconnect: could not connect to localhost:27020: [Errno 111] Connection refusedpymongo.errors.OperationFailure: database error: error querying server

Mongos errors talking to mongod on original primary:

Tue Apr 2 09:01:05 [conn3288] Socket say send() errno:110 Connection timed out 10.141.131.214:27017Tue Apr 2 09:01:05 [conn3288] DBException in process: socket exception [SEND_ERROR] for 10.141.131.214:27017

Connection pool checked lazily; invalid connections can persist for days, depending on load. Can clear manually:mongos> db.adminCommand({connPoolSync:1});{ "ok" : 1 }mongos>

Failure handlingApplications must handle fail-over outages:AutoReconnect & OperationFailure in pymongo

def auto_reconnect(func, *args, **kwargs):""" Executes func, retrying on AutoReconnect """for _ in range(100):

try:return func(self, *args, **kwargs)

except pymongo.errors.AutoReconnect:pass

except pymongo.errors.OperationFailure:pass

time.sleep(0.1)raise TimeoutError()

MMS (MongoDB Monitoring Service)● free; hosted by 10gen● need to run agent locally● 10gen's commercial support relies on MMS

Profiling queries [1]Finding bad queries that are actively running:$ mongo | tee mongo.log> db.currentOp()...bye$ grep numYields mongo.log

"numYields" : 0,"numYields" : 62247,"numYields" : 0,...

# Use your favorite viewer to find the op with 62247 yields

Helpful to get server back to a responsive state:$ mongo> db.killOp(10883898)

Profiling queries [2]Using nscanned to find queries that likely aren't using indexes:$ grep -P 'nscanned:\d\d' /var/log/mongodb.log

... or in real-time:$ tail -f /var/log/mongodb.log | grep -P 'nscanned:\d\d'

MongoDB also provides the setProfilingLevel() command which can log all queries to system.profile collection. > db.system.profile.find({nscanned:{$gte:10}})

system.profile does incur some performance overhead, though.

Nagios● plugin uses pymongo● set up service groups

https://github.com/mzupan/nagios-plugin-mongodb

Ideas for the future

● Better reconnect handling in applications● Lose the EBS? Ephemeral disk faster; rely

on replication to keep data persistent.● Intelligent use of mongo profiling (reduce

observer effect of setProfilingLevel)● Use more MMS alerts● Going to 2.4.x (fast counts, hashed

sharding)

production mongodb in the cloud

Technology

db config mongos db

data size mongos db

true db

manual primary

yourdomain test

offpeak mongos

primary state

original primary