scaling with mongodb

47
Scaling with MongoDB Aaron Staple [email protected] Mongo Seattle July 27, 2010

Upload: mongodb

Post on 05-Jul-2015

3.664 views

Category:

Technology


1 download

DESCRIPTION

Aaron Staple's presentation at Mongo Seattle

TRANSCRIPT

Page 1: Scaling with MongoDB

Scaling with MongoDBAaron Staple

[email protected]

Mongo Seattle

July 27, 2010

Page 2: Scaling with MongoDB

MongoDB 1.6Comes out next week!

Page 3: Scaling with MongoDB

Differences from Typical RDBMS

Memory mapped data

All data in memory (if it fits), synced to disk periodically

No joins

Reads have greater data locality

No joins between servers

No transactions

Improves performance of various operations

No transactions between servers

Page 4: Scaling with MongoDB

Topics

Single server read scaling

Single server write scaling

Scaling reads with a master/slave cluster

Scaling reads with replica sets

Scaling reads and writes with sharding

Page 5: Scaling with MongoDB

Denormalize

{ userid: 100,

books: [

{ title: „James and the Giant Peach‟,

author: „Roald Dahl‟ },

{ title: „Charlotte‟s Web‟,

author: „E B White‟ },

{ title: „A Wrinkle in Time‟,

author: „Madeleine L‟Engle‟ }

]

}

Page 6: Scaling with MongoDB

Use Indices

Find by value

db.users.find( { userid: 100 } )

Find by range of values

db.users.find( { age: { $gte: 20, $lte: 40 } } )

db.users.find( { hobbies: { $in: [ „biking‟, „running‟, „swimming‟ ] } )

Find with a sort spec

db.users.find().sort( { signup_ts: -1 } )

db.users.find( { hobbies: „snorkeling‟ } ).sort( { signup_ts: -1 } )

Index on { hobbies: 1, signup_ts: -1 }

Page 7: Scaling with MongoDB

Use Indices

Writes with a query component

db.users.remove( { userid: 100 } )

Other operations

count

distinct

group

map/reduce

anything with a query spec

Page 8: Scaling with MongoDB

Use Indices

Look for slow operations

Mongod log

Profiling

Examine how your indexes are used

db.users.find( { age: 90, hobbies: „snowboarding‟ }

).explain()

{ age: 1 }

{ hobbies: 1 }

Index numbers rather than strings

Page 9: Scaling with MongoDB

Leverage RAM

Indexes perform best when they fit in RAM

db.users.stats()

Index sizes

db.serverStatus()

Index hit rate in RAM

Check paging

vmstat

Page 10: Scaling with MongoDB

Restrict Fields

db.users.find( { userid: 100 }, { hobbies: 1 } )

Just returns hobbies

No less work for mongo, but less network traffic and

less work for the app server to parse result

Page 11: Scaling with MongoDB

Topics

Single server read scaling

Single server write scaling

Scaling reads with a master/slave cluster

Scaling reads with replica sets

Scaling reads and writes with sharding

Page 12: Scaling with MongoDB

Use Modifiers

Update in place

db.users.update( { userid: 100 }, { $inc: { views: 1 } } )

db.users.update( { userid: 100 }, { $set: { pet: „dog‟ } } )

performs pretty well too

For very complex modifiers, consider cost of performing

operation on database versus app server (generally easier to

add an app server)

Balance against atomicity requirements

Even without modifiers, consistency in object size can help

Page 13: Scaling with MongoDB

Drop Indices

Avoid redundant indices

{ userid: 1 }

{ userid: -1 }

{ userid: 1, signup_ts: -1 }

db.users.update( { userid: 100 }, { $inc: { views: 1 } } )

don‟t index views

db.user15555.drop()

not db.user15555.remove( {} )

Page 14: Scaling with MongoDB

Fire and forget

Unsafe “asynchronous” writes

No confirmation from mongo that write succeeded

Reduce latency at app server

Writes queued in mongod server‟s network buffer

Page 15: Scaling with MongoDB

Use Capped Collections

Fixed size collection

When space runs out, new documents replace the oldest

documents

Simple allocation model means writes are fast

No _id index by default

db.createCollection( „log‟, {capped:true, size:30000} );

Page 16: Scaling with MongoDB

Wordnik Configuration

1000 requests of various types / second

5 billion documents (1.2TB)

Single 2x4 core server 32gb ram, FC SAN non virtualized

NOTE: Virtualized storage tends to perform poorly, for

example if you are on EC2 you should run several EBS

volumes striped

Page 17: Scaling with MongoDB

Topics

Single server read scaling

Single server write scaling

Scaling reads with a master/slave cluster

Scaling reads with replica sets

Scaling reads and writes with sharding

Page 18: Scaling with MongoDB

Master/Slave

Easy to set up

mongod --master

mongod --slave --source <host>

App server maintains two connections

Writes go to master

Reads come from slave

Slave will generally be a bit behind master

Can sync writes to slave(s) using getlasterror „w‟ parameter

Page 19: Scaling with MongoDB

Master/Slave

MASTER

SLAVE 1 SLAVE 2

APP SERVER 1 APP SERVER 2

Page 20: Scaling with MongoDB

Monotonic Read Consistency

MASTER

SLAVE 1 SLAVE 2

APP SERVER 1 APP SERVER 2

Sourceforge uses this configuration, with 5 read slaves,

to power most content for all projects

Page 21: Scaling with MongoDB

Master/Slave

A master experiences some additional read load per

additional read slave

A slave experiences the same write load as the master

Consider --only option to reduce write load on slave

Delayed slave

Diagnostics

use local; db.printReplicationInfo()

use local; db.printSlaveReplicationInfo()

Page 22: Scaling with MongoDB

Topics

Single server read scaling

Single server write scaling

Scaling reads with a master/slave cluster

Scaling reads with replica sets

Scaling reads and writes with sharding

Page 23: Scaling with MongoDB

Replica Sets

Cluster of N servers

Only one node is „primary‟ at a time

This is equivalent to master

The node where writes go

Primary is elected by concensus

Automatic failover

Automatic recovery of failed nodes

Page 24: Scaling with MongoDB

Replica Sets - Writes

A write is only „committed‟ once it has been replicated to a majority of nodes in the set

Before this happens, reads to the set may or may not see the write

On failover, data which is not „committed‟ may be dropped (but not necessarily)

If dropped, it will be rolled back from all servers which wrote it

For improved durability, use getLastError/w

Other criteria – block writes when nodes go down or slaves get too far behind

Or, to reduce latency, reduce getLastError/w

Page 25: Scaling with MongoDB

Replica Sets - Nodes

Nodes monitor each other‟s heartbeats

If primary can‟t see a majority of nodes, it relinquishes

primary status

If a majority of nodes notice there is no primary, they

elect a primary using criteria

Node priority

Node data‟s freshness

Page 26: Scaling with MongoDB

Replica Sets - Nodes

Member 1

Member 3

Member 2

Page 27: Scaling with MongoDB

Replica Sets - Nodes

Member 1

SECONDARY

Member 3

PRIMARY

Member 2

SECONDARY

{a:1}

{b:2}

{c:3}

{a:1}

{a:1}

{b:2}

Page 28: Scaling with MongoDB

Replica Sets - Nodes

Member 1

SECONDARY

Member 3

DOWN

Member 2

PRIMARY

{a:1}

{b:2}

{c:3}

{a:1}

{a:1}

{b:2}

Page 29: Scaling with MongoDB

Replica Sets - Nodes

Member 1

SECONDARY

Member 3

RECOVERING

Member 2

PRIMARY

{a:1}

{b:2}

{c:3}

{a:1}

{b:2}

{a:1}

{b:2}

Page 30: Scaling with MongoDB

Replica Sets - Nodes

Member 1

SECONDARY

Member 2

PRIMARY

Member 3

SECONDARY

{a:1}

{b:2}

{a:1}

{b:2}

{a:1}

{b:2}

Page 31: Scaling with MongoDB

Replica Sets – Node Types

Standard – can be primary or secondary

Passive – will be secondary but never primary

Arbiter – will vote on primary, but won‟t replicate data

Page 32: Scaling with MongoDB

SlaveOk

db.getMongo().setSlaveOk();

Syntax varies by driver

Writes to master, reads to slave

Slave will be picked arbitrarily

Page 33: Scaling with MongoDB

Topics

Single server read scaling

Single server write scaling

Scaling reads with a master/slave cluster

Scaling reads with replica sets

Scaling reads and writes with sharding

Page 34: Scaling with MongoDB

Sharding Architecture

Page 35: Scaling with MongoDB

Shard

A master/slave cluster

Or a replica set

Manages a well defined range of shard keys

Page 36: Scaling with MongoDB

Shard

Distribute data across machines

Reduce data per machine

Better able to fit in RAM

Distribute write load across shards

Distribute read load across shards, and across nodes

within shards

Page 37: Scaling with MongoDB

Shard Key

{ user_id: 1 }

{ lastname: 1, firstname: 1 }

{ tag: 1, timestamp: -1 }

{ _id: 1 }

This is the default

Collection Min Max location

users {name:‟Miller‟} {name:‟Nessman‟} shard 2

users {name:‟Nessman‟} {name:‟Ogden‟} Shard 4

Page 38: Scaling with MongoDB

Mongos

Routes data to/from shards

db.users.find( { user_id: 5000 } )

db.users.find( { user_id: { $gt: 4000, $lt: 6000 } } )

db.users.find( { hometown: „Seattle‟ } )

db.users.find( { hometown: „Seattle‟ } ).sort( { user_id:

1 } )

Page 39: Scaling with MongoDB

Secondary Index

db.users.find( { hometown: „Seattle‟ } ).sort( {

lastname: 1 } )

Page 40: Scaling with MongoDB

SlaveOk

Works for a replica set acting as a shard the same as for

a standard replica set

Page 41: Scaling with MongoDB

Writes work similarly

db.users.save( { user_id: 5000, … } )

Shard key must be supplied

db.users.update( { user_id: 5000 }, { $inc: { views: 1 } }

)

db.users.remove( { user_id: { $lt: 1000 } } )

db.users.remove( { signup_ts: { $lt: oneYearAgo } }

Page 42: Scaling with MongoDB

Writes across shards

Asynchronous writes (fire and forget)

Writes sent to all shards sequentially, executed per shard in

parallel

Synchronous writes (confirmation)

Send writes sequentially, as above

Call getLastError on shards sequentially

Mongos limits shards which must be touched

Data partitioning limits data each node must touch (for

example, it may be more likely to fit in RAM)

Page 43: Scaling with MongoDB

Increasing Shard Key

What if I keep inserting data with increasing values for

the shard key?

All new data will go to last shard initially

We have special purpose code to handle this case, but it

can still be less performant than a more uniformally

distributed key

Example: auto generated mongo ObjectId

Page 44: Scaling with MongoDB

Adding a Shard

Monitor your performance

If you need more disk bandwidth for writes, add a shard

Monitor your RAM usage – vmstat

If you are paging too much, add a shard

Page 45: Scaling with MongoDB

Balancing

Mongo automatically adjusts the key ranges per shard to

balance data size between shards

Other metrics will be possible in future – disk ops, cpu

Currently move just one “chunk” at a time

Keeps overhead of balancing slow

Page 46: Scaling with MongoDB

Sharding models

Database not sharded

Collections within database are sharded

Documents within collection are sharded

If remove a shard, any unsharded data on it must be

migrated manually (for now).

Page 47: Scaling with MongoDB

Give it a Try!

Download from mongodb.org

Sharding and replica sets production ready in 1.6, which

is scheduled for release next week

For now use 1.5 (unstable) to try sharding and replica

sets