shortcuts around the mistakes i've made scaling mongodb

SHORTCUTSAROUND THEMISTAKES I’VEMADE SCALING

MONGODB

Theo, Chief Architect atonsdag 21 september 11

What we doWe want to revolutionize the digital advertising industry by showing that there is more to ad analytics than click through rates.

onsdag 21 september 11

Assembling sessionsexposure

pingping

ping ping

session➔ ➔

Crunching

session

sessionsession

session session

session

➔ ➔ 42

Reports

What we doTrack ads, make pretty reports.

That doesn’t sound so hard

That doesn’t sound so hardWe don’t know when sessions end

That doesn’t sound so hardWe don’t know when sessions endThere’s a lot of data

That doesn’t sound so hardWe don’t know when sessions endThere’s a lot of dataIt’s all done in (close to) real time

Numbers

Numbers40 Gb data

Numbers40 Gb data50 million documents

Numbers40 Gb data50 million documentsper day

How we use MongoDB

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finish

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finishShort time storage (<48 hours) for batch jobs

How we use MongoDB“Virtual memory” to offload data while we wait for sessions to finishShort time storage (<48 hours) for batch jobsMetrics storage

Why we use MongoDB

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideas

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writes

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writesSecondary indexes and rich query language are great features (for the metrics store)

Why we use MongoDBSchemalessness makes things so much easier, the data we collect changes as we come up with new ideasSharding makes it possible to scale writesSecondary indexes and rich query language are great features (for the metrics store)It’s just… nice

Btw.We use JRuby, it’s awesome

A story in 7 iterations

secondary indexes and updates1st iteration

One document per session, update as new data comes alongOutcome: 1000% write lock

#1Everything is aboutworking around the

GLOBALWRITELOCK

MongoDB 2.0.0

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)

MongoDB 1.8.1

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: “...”}}, true)

using scans for two step assembling2nd iteration

Instead of updating, save each fragment, then scan over _id to assemble sessions

using scans for two step assembling2nd iteration

Outcome: not as much lock, but still not great performance. We also realised we couldn’t remove data fast enough

GLOBALWRITELOCK

#3Give a lot of

thought to your

PRIMARYKEY

partitioning3rd iteration

We came up with the idea of partitioning the data by writing to a new collection every hour

We came up with the idea of partitioning the data by writing to a new collection every hourOutcome: lots of complicated code, lots of bugs, but we didn’t have to care about removing data

#4Make sure you can

REMOVE OLD DATA

sharding4th iteration

To get around the global write lock and get higher write performance we moved to a sharded cluster.

To get around the global write lock and get higher write performance we moved to a sharded cluster.Outcome: higher write performance, lots of problems, lots of ops time spent debugging

GLOBALWRITELOCK

#6SHARDINGIS NOT A

SILVER BULLETand it’s buggy,

if you can, avoid it

#7IT WILL FAIL

design for it

moving things to separate clusters5th iteration

We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.

We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.Outcome: a more balanced and stable cluster

GLOBALWRITELOCK

#9ONE DATABASEwith one usage pattern

PER CLUSTER

#10MONITOR

EVERYTHINGlook at your health

graphs daily

monster machines6th iteration

We got new problems removing data and needed some room to breathe and think

We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to High-Memory Quadruple Extra Large (with cheese).

♥Ionsdag 21 september 11

#11Don’t try to scale up

SCALE OUT

#12When you’re out of ideas

CALL THE EXPERTS

partitioning (again) and pre-chunking7th iteration

We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.

We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.Outcome: no more problems removing data.

#13Smaller objects means a smaller database, and a smaller database means

LESS RAM NEEDED

#14Give a lot of

thought to your

PRIMARYKEY

GLOBALWRITELOCK

KTHXBAI

@iconaraarchitecturalatrocities.com

burtcorp.com

Since we got time…

Safe modeTips

Run every Nth insert in safe mode

Safe modeTips

Run every Nth insert in safe modeThis will give you warnings when bad things happen; like failovers

Avoid bulk insertsTips

Very dangerous if there’s a possibility of duplicate key errors

EC2Tips

You have three copies of your data, do you really need EBS?

EC2Tips

You have three copies of your data, do you really need EBS?Instance store disks are included in the price and they have predictable performance.

EC2Tips

You have three copies of your data, do you really need EBS?Instance store disks are included in the price and they have predictable performance.m1.xlarge comes with 1.7 TB of storage.

shortcuts around the mistakes i've made scaling mongodb

mongodb virtual

doesnt sound

ofoad data

sessions end

sessions

dont

hard

lot

Technology

sms marketing with shortcuts -...

mongodb europe 2016 - graph operations with mongodb

word 2010 keyboard shortcuts 2010 keyboard shortcuts

mongodb são paulo - utilizando mongodb com .net

tallyukegroup.files.wordpress.com · rain drops, moun -...

mongodb days silicon valley: introducing mongodb 3.2

mongodb days germany: data processing with mongodb

mongodb as a nosql database · yum install mongodb...

mongodb training | mongodb online training | mongodb...

mongodb europe 2016 - distributed ledgers, blockchain +...

mongodb: what, why, when. solutions architect, mongodb inc....

mongodb in use(김인범, mongodb korea)

mongodb atlas - on tour!: introduction to mongodb

windows 7 keyboard shortcuts - rnib · web viewwindows 7...

system keyboard shortcuts · 2020. 1. 9. · system...

mongodb europe 2016 - mongodb 3.4 preview and introduction...

excell shortcuts

key press shortcuts table of contents key press shortcuts...

time series data in mongodb senior solutions architect,...

excel - shortcuts bible · excel shortcuts bible ©...