shortcuts around the mistakes i've made scaling mongodb

80
SHORTCUTS AROUND THE MISTAKES I’VE MADE SCALING MONGODB Theo, Chief Architect at onsdag 21 september 11

Upload: iconara

Post on 01-Nov-2014

843 views

Category:

Technology


0 download

DESCRIPTION

Presentation held at MongoUK, September 2012

TRANSCRIPT

Page 1: Shortcuts around the mistakes I've made scaling MongoDB

SHORTCUTSAROUND THEMISTAKES I’VEMADE SCALING

MONGODB

Theo, Chief Architect atonsdag 21 september 11

Page 2: Shortcuts around the mistakes I've made scaling MongoDB

What we doWe want to revolutionize the digital advertising industry by showing that there is more to ad analytics than click through rates.

onsdag 21 september 11

Page 3: Shortcuts around the mistakes I've made scaling MongoDB

Ads

onsdag 21 september 11

Page 4: Shortcuts around the mistakes I've made scaling MongoDB

Data

onsdag 21 september 11

Page 5: Shortcuts around the mistakes I've made scaling MongoDB

Assembling sessionsexposure

pingping

ping ping

ping

event

event

ping

session➔ ➔

onsdag 21 september 11

Page 6: Shortcuts around the mistakes I've made scaling MongoDB

Crunching

session

session

session

session

sessionsession

session session

session

session

session

session

session

➔ ➔ 42

onsdag 21 september 11

Page 7: Shortcuts around the mistakes I've made scaling MongoDB

Reports

onsdag 21 september 11

Page 8: Shortcuts around the mistakes I've made scaling MongoDB

What we doTrack ads, make pretty reports.

onsdag 21 september 11

Page 9: Shortcuts around the mistakes I've made scaling MongoDB

That doesn’t sound so hard

onsdag 21 september 11

Page 10: Shortcuts around the mistakes I've made scaling MongoDB

That doesn’t sound so hardWe don’t know when sessions end

onsdag 21 september 11

Page 11: Shortcuts around the mistakes I've made scaling MongoDB

That doesn’t sound so hardWe don’t know when sessions endThere’s a lot of data

onsdag 21 september 11

Page 12: Shortcuts around the mistakes I've made scaling MongoDB

That doesn’t sound so hardWe don’t know when sessions endThere’s a lot of dataIt’s all done in (close to) real time

onsdag 21 september 11

Page 13: Shortcuts around the mistakes I've made scaling MongoDB

Numbers

onsdag 21 september 11

Page 14: Shortcuts around the mistakes I've made scaling MongoDB

Numbers40 Gb data

onsdag 21 september 11

Page 15: Shortcuts around the mistakes I've made scaling MongoDB

Numbers40 Gb data50 million documents

onsdag 21 september 11

Page 16: Shortcuts around the mistakes I've made scaling MongoDB

Numbers40 Gb data50 million documentsper day

onsdag 21 september 11

Page 17: Shortcuts around the mistakes I've made scaling MongoDB

How we use MongoDB

onsdag 21 september 11

Page 18: Shortcuts around the mistakes I've made scaling MongoDB

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finish

onsdag 21 september 11

Page 19: Shortcuts around the mistakes I've made scaling MongoDB

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finishShort time storage (<48 hours) for batch jobs

onsdag 21 september 11

Page 20: Shortcuts around the mistakes I've made scaling MongoDB

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finishShort time storage (<48 hours) for batch jobsMetrics storage

onsdag 21 september 11

Page 21: Shortcuts around the mistakes I've made scaling MongoDB

Why we use MongoDB

onsdag 21 september 11

Page 22: Shortcuts around the mistakes I've made scaling MongoDB

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideas

onsdag 21 september 11

Page 23: Shortcuts around the mistakes I've made scaling MongoDB

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writes

onsdag 21 september 11

Page 24: Shortcuts around the mistakes I've made scaling MongoDB

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writesSecondary indexes and rich query language are great features (for the metrics store)

onsdag 21 september 11

Page 25: Shortcuts around the mistakes I've made scaling MongoDB

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writesSecondary indexes and rich query language are great features (for the metrics store)It’s just… nice

onsdag 21 september 11

Page 26: Shortcuts around the mistakes I've made scaling MongoDB

Btw.

onsdag 21 september 11

Page 27: Shortcuts around the mistakes I've made scaling MongoDB

Btw.We use JRuby, it’s awesome

onsdag 21 september 11

Page 28: Shortcuts around the mistakes I've made scaling MongoDB

A story in 7 iterations

onsdag 21 september 11

Page 29: Shortcuts around the mistakes I've made scaling MongoDB

secondary indexes and updates1st iteration

onsdag 21 september 11

Page 30: Shortcuts around the mistakes I've made scaling MongoDB

secondary indexes and updates1st iteration

One document per session, update as new data comes alongOutcome: 1000% write lock

onsdag 21 september 11

Page 31: Shortcuts around the mistakes I've made scaling MongoDB

#1Everything is aboutworking around the

GLOBALWRITELOCK

onsdag 21 september 11

Page 32: Shortcuts around the mistakes I've made scaling MongoDB

MongoDB 2.0.0

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)

onsdag 21 september 11

Page 33: Shortcuts around the mistakes I've made scaling MongoDB

MongoDB 1.8.1

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)

onsdag 21 september 11

Page 34: Shortcuts around the mistakes I've made scaling MongoDB

using scans for two step assembling2nd iteration

Instead of updating, save each fragment, then scan over _id to assemble sessions

onsdag 21 september 11

Page 35: Shortcuts around the mistakes I've made scaling MongoDB

using scans for two step assembling2nd iteration

Outcome: not as much lock, but still not great performance. We also realised we couldn’t remove data fast enough

onsdag 21 september 11

Page 36: Shortcuts around the mistakes I've made scaling MongoDB

#2Everything is aboutworking around the

GLOBALWRITELOCK

onsdag 21 september 11

Page 37: Shortcuts around the mistakes I've made scaling MongoDB

#3Give a lot of

thought to your

PRIMARYKEY

onsdag 21 september 11

Page 38: Shortcuts around the mistakes I've made scaling MongoDB

partitioning3rd iteration

onsdag 21 september 11

Page 39: Shortcuts around the mistakes I've made scaling MongoDB

partitioning3rd iteration

We came up with the idea of partitioning the data by writing to a new collection every hour

onsdag 21 september 11

Page 40: Shortcuts around the mistakes I've made scaling MongoDB

partitioning3rd iteration

We came up with the idea of partitioning the data by writing to a new collection every hourOutcome: lots of complicated code, lots of bugs, but we didn’t have to care about removing data

onsdag 21 september 11

Page 41: Shortcuts around the mistakes I've made scaling MongoDB

#4Make sure you can

REMOVE OLD DATA

onsdag 21 september 11

Page 42: Shortcuts around the mistakes I've made scaling MongoDB

sharding4th iteration

onsdag 21 september 11

Page 43: Shortcuts around the mistakes I've made scaling MongoDB

sharding4th iteration

To get around the global write lock and get higher write performance we moved to a sharded cluster.

onsdag 21 september 11

Page 44: Shortcuts around the mistakes I've made scaling MongoDB

sharding4th iteration

To get around the global write lock and get higher write performance we moved to a sharded cluster.Outcome: higher write performance, lots of problems, lots of ops time spent debugging

onsdag 21 september 11

Page 45: Shortcuts around the mistakes I've made scaling MongoDB

#5Everything is aboutworking around the

GLOBALWRITELOCK

onsdag 21 september 11

Page 46: Shortcuts around the mistakes I've made scaling MongoDB

#6SHARDINGIS NOT A

SILVER BULLETand it’s buggy,

if you can, avoid it

onsdag 21 september 11

Page 47: Shortcuts around the mistakes I've made scaling MongoDB

onsdag 21 september 11

Page 48: Shortcuts around the mistakes I've made scaling MongoDB

#7IT WILL FAIL

design for it

onsdag 21 september 11

Page 49: Shortcuts around the mistakes I've made scaling MongoDB

onsdag 21 september 11

Page 50: Shortcuts around the mistakes I've made scaling MongoDB

onsdag 21 september 11

Page 51: Shortcuts around the mistakes I've made scaling MongoDB

moving things to separate clusters5th iteration

onsdag 21 september 11

Page 52: Shortcuts around the mistakes I've made scaling MongoDB

moving things to separate clusters5th iteration

We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.

onsdag 21 september 11

Page 53: Shortcuts around the mistakes I've made scaling MongoDB

moving things to separate clusters5th iteration

We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.Outcome: a more balanced and stable cluster

onsdag 21 september 11

Page 54: Shortcuts around the mistakes I've made scaling MongoDB

#8Everything is aboutworking around the

GLOBALWRITELOCK

onsdag 21 september 11

Page 55: Shortcuts around the mistakes I've made scaling MongoDB

#9ONE DATABASEwith one usage pattern

PER CLUSTER

onsdag 21 september 11

Page 56: Shortcuts around the mistakes I've made scaling MongoDB

#10MONITOR

EVERYTHINGlook at your health

graphs daily

onsdag 21 september 11

Page 57: Shortcuts around the mistakes I've made scaling MongoDB

monster machines6th iteration

onsdag 21 september 11

Page 58: Shortcuts around the mistakes I've made scaling MongoDB

monster machines6th iteration

We got new problems removing data and needed some room to breathe and think

onsdag 21 september 11

Page 59: Shortcuts around the mistakes I've made scaling MongoDB

monster machines6th iteration

We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to High-Memory Quadruple Extra Large (with cheese).

onsdag 21 september 11

Page 60: Shortcuts around the mistakes I've made scaling MongoDB

monster machines6th iteration

We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to High-Memory Quadruple Extra Large (with cheese).

♥Ionsdag 21 september 11

Page 61: Shortcuts around the mistakes I've made scaling MongoDB

#11Don’t try to scale up

SCALE OUT

onsdag 21 september 11

Page 62: Shortcuts around the mistakes I've made scaling MongoDB

#12When you’re out of ideas

CALL THE EXPERTS

onsdag 21 september 11

Page 63: Shortcuts around the mistakes I've made scaling MongoDB

partitioning (again) and pre-chunking7th iteration

onsdag 21 september 11

Page 64: Shortcuts around the mistakes I've made scaling MongoDB

partitioning (again) and pre-chunking7th iteration

We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.

onsdag 21 september 11

Page 65: Shortcuts around the mistakes I've made scaling MongoDB

partitioning (again) and pre-chunking7th iteration

We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.Outcome: no more problems removing data.

onsdag 21 september 11

Page 66: Shortcuts around the mistakes I've made scaling MongoDB

#13Smaller objects means a smaller database, and a smaller database means

LESS RAM NEEDED

onsdag 21 september 11

Page 67: Shortcuts around the mistakes I've made scaling MongoDB

#14Give a lot of

thought to your

PRIMARYKEY

onsdag 21 september 11

Page 68: Shortcuts around the mistakes I've made scaling MongoDB

#15Everything is aboutworking around the

GLOBALWRITELOCK

onsdag 21 september 11

Page 69: Shortcuts around the mistakes I've made scaling MongoDB

#16Everything is aboutworking around the

GLOBALWRITELOCK

onsdag 21 september 11

Page 70: Shortcuts around the mistakes I've made scaling MongoDB

KTHXBAI

@iconaraarchitecturalatrocities.com

burtcorp.com

onsdag 21 september 11

Page 71: Shortcuts around the mistakes I've made scaling MongoDB

Since we got time…

onsdag 21 september 11

Page 72: Shortcuts around the mistakes I've made scaling MongoDB

Safe modeTips

onsdag 21 september 11

Page 73: Shortcuts around the mistakes I've made scaling MongoDB

Safe modeTips

Run every Nth insert in safe mode

onsdag 21 september 11

Page 74: Shortcuts around the mistakes I've made scaling MongoDB

Safe modeTips

Run every Nth insert in safe modeThis will give you warnings when bad things happen; like failovers

onsdag 21 september 11

Page 75: Shortcuts around the mistakes I've made scaling MongoDB

Avoid bulk insertsTips

onsdag 21 september 11

Page 76: Shortcuts around the mistakes I've made scaling MongoDB

Avoid bulk insertsTips

Very dangerous if there’s a possibility of duplicate key errors

onsdag 21 september 11

Page 77: Shortcuts around the mistakes I've made scaling MongoDB

EC2Tips

onsdag 21 september 11

Page 78: Shortcuts around the mistakes I've made scaling MongoDB

EC2Tips

You have three copies of your data, do you really need EBS?

onsdag 21 september 11

Page 79: Shortcuts around the mistakes I've made scaling MongoDB

EC2Tips

You have three copies of your data, do you really need EBS?Instance store disks are included in the price and they have predictable performance.

onsdag 21 september 11

Page 80: Shortcuts around the mistakes I've made scaling MongoDB

EC2Tips

You have three copies of your data, do you really need EBS?Instance store disks are included in the price and they have predictable performance.m1.xlarge comes with 1.7 TB of storage.

onsdag 21 september 11