scaling with mongodb

Scaling with MongoDBAaron Staple

[email protected]

Mongo Seattle

July 27, 2010

MongoDB 1.6Comes out next week!

Differences from Typical RDBMS

Memory mapped data

All data in memory (if it fits), synced to disk periodically

No joins

Reads have greater data locality

No joins between servers

No transactions

Improves performance of various operations

No transactions between servers

Topics

Single server read scaling

Single server write scaling

Scaling reads with a master/slave cluster

Scaling reads with replica sets

Scaling reads and writes with sharding

Denormalize

{ userid: 100,

books: [

{ title: „James and the Giant Peach‟,

author: „Roald Dahl‟ },

{ title: „Charlotte‟s Web‟,

author: „E B White‟ },

{ title: „A Wrinkle in Time‟,

author: „Madeleine L‟Engle‟ }

]

}

Use Indices

Find by value

db.users.find( { userid: 100 } )

Find by range of values

db.users.find( { age: { $gte: 20, $lte: 40 } } )

db.users.find( { hobbies: { $in: [ „biking‟, „running‟, „swimming‟ ] } )

Find with a sort spec

db.users.find().sort( { signup_ts: -1 } )

db.users.find( { hobbies: „snorkeling‟ } ).sort( { signup_ts: -1 } )

Index on { hobbies: 1, signup_ts: -1 }

Use Indices

Writes with a query component

db.users.remove( { userid: 100 } )

Other operations

count

distinct

group

map/reduce

anything with a query spec

Use Indices

Look for slow operations

Mongod log

Profiling

Examine how your indexes are used

db.users.find( { age: 90, hobbies: „snowboarding‟ }

).explain()

{ age: 1 }

{ hobbies: 1 }

Index numbers rather than strings

Leverage RAM

Indexes perform best when they fit in RAM

db.users.stats()

Index sizes

db.serverStatus()

Index hit rate in RAM

Check paging

vmstat

Restrict Fields

db.users.find( { userid: 100 }, { hobbies: 1 } )

Just returns hobbies

No less work for mongo, but less network traffic and

less work for the app server to parse result

Topics






Use Modifiers

Update in place

db.users.update( { userid: 100 }, { $inc: { views: 1 } } )

db.users.update( { userid: 100 }, { $set: { pet: „dog‟ } } )

performs pretty well too

For very complex modifiers, consider cost of performing

operation on database versus app server (generally easier to

add an app server)

Balance against atomicity requirements

Even without modifiers, consistency in object size can help

Drop Indices

Avoid redundant indices

{ userid: 1 }

{ userid: -1 }

{ userid: 1, signup_ts: -1 }

db.users.update( { userid: 100 }, { $inc: { views: 1 } } )

don‟t index views

db.user15555.drop()

not db.user15555.remove( {} )

Fire and forget

Unsafe “asynchronous” writes

No confirmation from mongo that write succeeded

Reduce latency at app server

Writes queued in mongod server‟s network buffer

Use Capped Collections

Fixed size collection

When space runs out, new documents replace the oldest

documents

Simple allocation model means writes are fast

No _id index by default

db.createCollection( „log‟, {capped:true, size:30000} );

Wordnik Configuration

1000 requests of various types / second

5 billion documents (1.2TB)

Single 2x4 core server 32gb ram, FC SAN non virtualized

NOTE: Virtualized storage tends to perform poorly, for

example if you are on EC2 you should run several EBS

volumes striped

Topics






Master/Slave

Easy to set up

mongod --master

mongod --slave --source <host>

App server maintains two connections

Writes go to master

Reads come from slave

Slave will generally be a bit behind master

Can sync writes to slave(s) using getlasterror „w‟ parameter

Master/Slave

MASTER

SLAVE 1 SLAVE 2

APP SERVER 1 APP SERVER 2

Monotonic Read Consistency

MASTER

SLAVE 1 SLAVE 2

APP SERVER 1 APP SERVER 2

Sourceforge uses this configuration, with 5 read slaves,

to power most content for all projects

Master/Slave

A master experiences some additional read load per

additional read slave

A slave experiences the same write load as the master

Consider --only option to reduce write load on slave

Delayed slave

Diagnostics

use local; db.printReplicationInfo()

use local; db.printSlaveReplicationInfo()

Topics






Replica Sets

Cluster of N servers

Only one node is „primary‟ at a time

This is equivalent to master

The node where writes go

Primary is elected by concensus

Automatic failover

Automatic recovery of failed nodes

Replica Sets - Writes

A write is only „committed‟ once it has been replicated to a majority of nodes in the set

Before this happens, reads to the set may or may not see the write

On failover, data which is not „committed‟ may be dropped (but not necessarily)

If dropped, it will be rolled back from all servers which wrote it

For improved durability, use getLastError/w

Other criteria – block writes when nodes go down or slaves get too far behind

Or, to reduce latency, reduce getLastError/w

Replica Sets - Nodes

Nodes monitor each other‟s heartbeats

If primary can‟t see a majority of nodes, it relinquishes

primary status

If a majority of nodes notice there is no primary, they

elect a primary using criteria

Node priority

Node data‟s freshness


Member 1

Member 3

Member 2


Member 1

SECONDARY

Member 3

PRIMARY

Member 2

SECONDARY

{a:1}

{b:2}

{c:3}

{a:1}

{a:1}

{b:2}


Member 1

SECONDARY

Member 3

DOWN

Member 2

PRIMARY

{a:1}

{b:2}

{c:3}

{a:1}

{a:1}

{b:2}


Member 1

SECONDARY

Member 3

RECOVERING

Member 2

PRIMARY

{a:1}

{b:2}

{c:3}

{a:1}

{b:2}

{a:1}

{b:2}


Member 1

SECONDARY

Member 2

PRIMARY

Member 3

SECONDARY

{a:1}

{b:2}

{a:1}

{b:2}

{a:1}

{b:2}

Replica Sets – Node Types

Standard – can be primary or secondary

Passive – will be secondary but never primary

Arbiter – will vote on primary, but won‟t replicate data

SlaveOk

db.getMongo().setSlaveOk();

Syntax varies by driver

Writes to master, reads to slave

Slave will be picked arbitrarily

Topics






Sharding Architecture

Shard

A master/slave cluster

Or a replica set

Manages a well defined range of shard keys

Shard

Distribute data across machines

Reduce data per machine

Better able to fit in RAM

Distribute write load across shards

Distribute read load across shards, and across nodes

within shards

Shard Key

{ user_id: 1 }

{ lastname: 1, firstname: 1 }

{ tag: 1, timestamp: -1 }

{ _id: 1 }

This is the default

Collection Min Max location

users {name:‟Miller‟} {name:‟Nessman‟} shard 2

users {name:‟Nessman‟} {name:‟Ogden‟} Shard 4

…

Mongos

Routes data to/from shards

db.users.find( { user_id: 5000 } )

db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } )

db.users.find( { hometown: „Seattle‟ } )

db.users.find( { hometown: „Seattle‟ } ).sort( { user_id:

1 } )

Secondary Index

db.users.find( { hometown: „Seattle‟ } ).sort( {

lastname: 1 } )

SlaveOk

Works for a replica set acting as a shard the same as for

a standard replica set

Writes work similarly

db.users.save( { user_id: 5000, … } )

Shard key must be supplied

db.users.update( { user_id: 5000 }, { $inc: { views: 1 } }

)

db.users.remove( { user_id: { $lt: 1000 } } )

db.users.remove( { signup_ts: { $lt: oneYearAgo } }

Writes across shards

Asynchronous writes (fire and forget)

Writes sent to all shards sequentially, executed per shard in

parallel

Synchronous writes (confirmation)

Send writes sequentially, as above

Call getLastError on shards sequentially

Mongos limits shards which must be touched

Data partitioning limits data each node must touch (for

example, it may be more likely to fit in RAM)

Increasing Shard Key

What if I keep inserting data with increasing values for

the shard key?

All new data will go to last shard initially

We have special purpose code to handle this case, but it

can still be less performant than a more uniformally

distributed key

Example: auto generated mongo ObjectId

Adding a Shard

Monitor your performance

If you need more disk bandwidth for writes, add a shard

Monitor your RAM usage – vmstat

If you are paging too much, add a shard

Balancing

Mongo automatically adjusts the key ranges per shard to

balance data size between shards

Other metrics will be possible in future – disk ops, cpu

Currently move just one “chunk” at a time

Keeps overhead of balancing slow

Sharding models

Database not sharded

Collections within database are sharded

Documents within collection are sharded

If remove a shard, any unsharded data on it must be

migrated manually (for now).

Give it a Try!

Download from mongodb.org

Sharding and replica sets production ready in 1.6, which

is scheduled for release next week

For now use 1.5 (unstable) to try sharding and replica

sets

scaling with mongodb

Technology