mongo sharding: case study

21
@wfbutton Phoenix MUG Sharding: A Case Study

Upload: will-button

Post on 15-Jan-2015

363 views

Category:

Technology


1 download

DESCRIPTION

Tips, tricks and lessons learned from sharding a 200GB production database.

TRANSCRIPT

Page 1: Mongo Sharding: Case Study

@wfbutton

Phoenix MUGSharding: A Case Study

Page 2: Mongo Sharding: Case Study

Overview

• What is sharding• How we knew it was time to

shard• What to shard• Choosing a shard key• Building servers

• Integrating sharding into a production environment

• Monitoring for success/failure

• Lessons learned• Things you can do today

Page 3: Mongo Sharding: Case Study

About Me

• DevOps/IT/DBA for myList.com

• Extensive background in both development and ops, specifically in scalability and sustainability

Will Button | @wfbutton | google.com/+WillButton

Page 4: Mongo Sharding: Case Study

What is sharding?

Page 5: Mongo Sharding: Case Study
Page 6: Mongo Sharding: Case Study
Page 7: Mongo Sharding: Case Study

Config Servers Mongos

• Stores metadata for the cluster• Not a replica set• Metadata consists of:

• collections• shards• chunks• mongos instances

• routing service for mongo shards• likely to run on application server• apps talk to mongos in sharded • environment

Page 8: Mongo Sharding: Case Study

When Should I Shard?

Page 9: Mongo Sharding: Case Study

When Should I Shard?{"ts" : ISODate("2013-11-01T01:34:30.683Z"),"op" : "query","ns" : "MyListContent.RepoThing","query" : {"query" : {"provId" : "cae56942-5c9c-776c-0506-2c9f4092e107","provUnifiedId" : "40233411"},"$readPreference" : {"mode" : "primary"}},"ntoreturn" : 0,"ntoskip" : 0,"nscanned" : 58,"keyUpdates" : 0,"numYield" : 16,"lockStats" : {"timeLockedMicros" : {"r" : NumberLong(1260461),"w" : NumberLong(0)},"timeAcquiringMicros" : {"r" : NumberLong(1275073),"w" : NumberLong(2369)}},"nreturned" : 57,"responseLength" : 91643,"millis" : 1200,"client" : "10.110.1.27","user" : ""}

58 records scanned57 documents returned

1200 milliseconds

YIKES!

Page 10: Mongo Sharding: Case Study

What To Shardsocialcatalog03:SECONDARY> db.system.profile.aggregate( ... { $group: ... { _id: "$ns", count: ... { $sum: 1 } ... }... } ... ){"result" : [{"_id" : "admin.$cmd","count" : 1},{"_id" : "MyListContent.BrandPage","count" : 97},{"_id" : "MyListContent.RepoThingUpdate","count" : 50},{"_id" : "MyListContent.RepoThing","count" : 1824}],"ok" : 1}

system.profile collection is your friend!

Page 11: Mongo Sharding: Case Study

Choosing A Shard Key

• Next to getting married, the most important decision you’ll ever make

Page 12: Mongo Sharding: Case Study

Choosing A Shard Key

Collection:stuffShard key: _id

0…………..100…………..200…………..300…………..400…………..500…………..600

Shard 1 Shard 2 Shard 3

Page 13: Mongo Sharding: Case Study

Adding New Servers

• Expanding production• Using Amazon EC2• Updating production• Does dev match prod?

Build EC2

Image

Clone Instances

Update conf rs.init()

Page 14: Mongo Sharding: Case Study

Shard: Actual Steps

mongos> db.BrandPage.ensureIndex( { "_id": "hashed" } )

mongos> sh.shardCollection("MyListContent.BrandPage", { "_id": "hashed" })

Page 15: Mongo Sharding: Case Study

Monitoring Shard Status

Page 16: Mongo Sharding: Case Study

Monitoring Shard Statusmongos> db.BrandPage.getShardDistribution()

Shard socialcatalog03 at socialcatalog03/10.110.1.148:27018,10.110.3.215:27018,10.110.4.142:27018 data : 1.26GiB docs : 3334394 chunks : 41 estimated data per chunk : 31.49MiB estimated docs per chunk : 81326

Totals data : 1.26GiB docs : 3334394 chunks : 41 Shard socialcatalog03 contains 100% data, 100% docs in cluster, avg obj size on shard : 406B

Page 17: Mongo Sharding: Case Study

mongos> db.BrandPage.getShardDistribution()

Shard rs210 at rs210/10.110.1.10:27018,10.110.1.110:27018,10.110.1.147:27018 data : 54.48MiB docs : 122774 chunks : 7 estimated data per chunk : 7.78MiB estimated docs per chunk : 17539

Shard rs220 at rs220/10.110.1.117:27018,10.110.1.149:27018,10.110.1.252:27018 data : 54.09MiB docs : 122151 chunks : 7 estimated data per chunk : 7.72MiB estimated docs per chunk : 17450

Shard rs310 at rs310/10.110.1.146:27018,10.110.1.197:27018,10.110.1.220:27018 data : 54.65MiB docs : 123138 chunks : 7 estimated data per chunk : 7.8MiB estimated docs per chunk : 17591

Shard rs320 at rs320/10.110.1.112:27018,10.110.1.150:27018,10.110.1.26:27018 data : 54.63MiB docs : 123163 chunks : 7 estimated data per chunk : 7.8MiB estimated docs per chunk : 17594

Shard socialcatalog02 at socialcatalog02/10.110.1.184:27018,10.110.1.222:27018,10.110.1.84:27018 data : 46.54MiB docs : 105031 chunks : 6 estimated data per chunk : 7.75MiB estimated docs per chunk : 17505

Shard socialcatalog03 at socialcatalog03/10.110.1.148:27018,10.110.1.16:27018,10.110.1.53:27018 data : 99.9MiB docs : 242755 chunks : 7 estimated data per chunk : 14.27MiB estimated docs per chunk : 34679

Totals data : 364.31MiB docs : 839012 chunks : 41 Shard rs210 contains 14.95% data, 14.63% docs in cluster, avg obj size on shard : 465B Shard rs220 contains 14.84% data, 14.55% docs in cluster, avg obj size on shard : 464B Shard rs310 contains 15% data, 14.67% docs in cluster, avg obj size on shard : 465B Shard rs320 contains 14.99% data, 14.67% docs in cluster, avg obj size on shard : 465B Shard socialcatalog02 contains 12.77% data, 12.51% docs in cluster, avg obj size on shard : 464B Shard socialcatalog03 contains 27.42% data, 28.93% docs in cluster, avg obj size on shard : 431B

Page 18: Mongo Sharding: Case Study

Sharding takes time…

• But check the logs

Thu Nov 21 00:19:35.964 [Balancer] caught exception while doing balance: error checking clock skew of cluster 10.110.0.251:27019,10.110.3.87:27019,10.110.4.225:27019 :: caused by :: 13650 clock skew of the cluster 10.110.0.251:27019,10.110.3.87:27019,10.110.4.225:27019 is too far out of bounds to allow distributed locking.

Thu Nov 21 21:25:16.249 [conn84709] about to log metadata event: { _id: "aws-prod-mongo301-2013-11-21T21:25:16-528e7a3c374ed2e78b6298e4", server: "aws-prod-mongo301", clientAddr: "10.110.1.71:43357", time: new Date(1385069116248), what: "moveChunk.from", ns: "MyListContent.BrandPage", details: { min: { _id: -7394546541005003026 }, max: { _id: -6937685518831975781 }, step1 of 6: 0, note: "aborted" } }

Page 19: Mongo Sharding: Case Study

Tips, Tricks, Gotchas

• Always use 3 config servers• Always use NTP• Always use CNAMES• Always specify configdb servers in the same

order• Shard early, shard often

Page 20: Mongo Sharding: Case Study

Things You Can Do Today

• Enable/analyze system.profile• Identify long running queries• Review indexes, queries and performance• Verify replica sets are in sync• Setup alerting for replica set sync• Replica sets are not backups• Schedule a data review with the devs to plan

sharding strategies

Page 21: Mongo Sharding: Case Study

Sharding MongoDB:A Case Study

www.two4seven.me/shardingSay hi! @wfbutton