mongodb world 2016: smart strategies for resilient applications

43
Smart Strategies for Resilient MongoDB Applications A. Jesse Jiryu Davis Staff Engineer @jessejiryudavis

Upload: mongodb

Post on 14-Apr-2017

263 views

Category:

Technology


0 download

TRANSCRIPT

Smart Strategiesfor

Resilient MongoDB Applications

A. Jesse Jiryu DavisStaff Engineer

@jessejiryudavis

Ian

Update

updateOne({ _id: '2016-06-28'}, {$inc: {counter: 1}}, upsert=True)

What Can Go Wrong?

MongoDB

updateOne

Network blip

MongoDB

updateOne

{ok: 0, errmsg: "not master"}

Primary failover

MongoDB

updateOne

Network down

MongoDB

updateOne

{ok: 0, errmsg: "not authorized"}

Command error

???

?

Network blip Primary failover Network down Command error

What If We Had Transactions?

COMMIT

Network blip

MongoDBSQL

Smart Strategy

What Do Drivers Do?

Network blip: set state "unknown", throw error Primary failover: same Network down: same Command error: just throw the error

what state?

Server DiscoveryAnd Monitoring Spec

Also known as "SDAM"

Server DiscoveryAnd Monitoring

Server 1: type "Primary" Server 2: type "Secondary" Server 3: type "Secondary"

Server 1: type "Unknown" Server 2: type "Secondary" Server 3: type "Secondary"

Network blip

Server DiscoveryAnd Monitoring

Server 1: type "Primary" Server 2: type "Secondary" Server 3: type "Secondary"

Server 1: type "Unknown" Server 2: type "Secondary" Server 3: type "Secondary"

Drivers check twice per second for up to 30 seconds

Server DiscoveryAnd Monitoring

Server 1: type "Primary" Server 2: type "Secondary" Server 3: type "Secondary"

Server 1: type "Unknown" Server 2: type "Secondary" Server 3: type "Secondary"

Primary failover

Server DiscoveryAnd Monitoring

Server 1: type "Unknown" Server 2: type "Secondary" Server 3: type "Secondary"

Drivers check twice per second for up to 30 seconds

Server 1: type "Secondary" Server 2: type "Primary" Server 3: type "Secondary"

Server 1: type "Secondary" Server 2: type "Secondary" Server 3: type "Secondary"

Update

updateOne({ _id: '2016-06-28'}, {$inc: {counter: 1}}, upsert=True)

Network blipPrimaryfailover

Network down

Commanderror

May undercount Undercounts Correct Correct

Bad Retry Strategy:

Don't retry.

Network blipPrimaryfailover

Network down

Commanderror

First retry succeeds, may overcount

First retry succeeds

Wastestime

Wastestime

Bad Retry Strategy:

Retry five times.

Network blipPrimaryfailover

Network down

Commanderror

First retry succeeds, may overcount

First retry succeeds

Correct Correct

Bad Retry Strategy:

Retry once,except command error.

final hurdle

Network blipPrimaryfailover

Network down

Commanderror

First retry succeeds, won't overcount

First retry succeeds

Correct Correct

Retry once,except command error.

Make all ops idempotent.

Make All Operations Idempotent

???

?

FindInsertDeleteUpdate

the hard one

Make All Operations Idempotent

Query Is Idempotent

try: doc = findOne()except network err: doc = findOne()

Idempotent Insert

doc = {_id: ObjectId(), ...}try: insertOne(doc)except network err: try: insertOne(doc) except DuplicateKeyError: pass # first try worked throw

Idempotent Delete

try: deleteOne({key: uniqueValue})except network err: deleteOne({key: uniqueValue})

Idempotent Delete

try: deleteMany({...})except network err: deleteMany({...})

Idempotent Update

updateOne({ _id: '2016-06-28'}, {$set:{sunny: true}}, upsert=True)

Idempotent Update

try: updateOne({ _id: '2016-06-28'}, {$set:{sunny: true}}, upsert=True)

except network err: try again, if that fails throw

not idempotent.

NON-Idempotent Update

the hard one.

updateOne({ _id: '2016-06-28'}, {$inc: {counter: 1}}, upsert=True)

Idempotent Update1. Add unique token:

{ _id: '2016-06-28', counter: N, pending: [ ObjectId("...") ]}

2. Remove token and increment counter:{ _id: '2016-06-28', counter: N, pending: [ ObjectId("...") ]}

2. Remove token and increment counter:{ _id: '2016-06-28', counter: N + 1, pending: [ ObjectId("...") ]}

2. Remove token and increment counter:{ _id: '2016-06-28', counter: N + 1, pending: [ ]}

oid = ObjectId()try: updateOne({_id: '2016-06-28'}, {$addToSet: {pending: oid}}, upsert=True)except network err: try again, then throw

try: updateOne({_id: '2016-01-01', pending: oid}, {$pull: {pending: oid}, $inc: {counter: 1}}, upsert=False)except network err: try again, then throw

pipeline = [{ $match: {'pending.0': {$exists: true}}}, { $project: { counter: { $add: [ '$counter', {$size: '$pending'} ]}]

for doc in collection.aggregate(pipeline):collection.updateOne( {_id: doc._id}, { $set: {counter: doc.counter}, $unset: {pending: true} })

Ian

Testing!?

bit.ly/black-pipe

More info:

bit.ly/resilient-applications