concurrency patterns with mongodb
TRANSCRIPT
CONCURRENCY
PATTERNS WITH
MONGODB
Yann Cluchey
CTO @ Cogenta
• Real-time retail intelligence
• Gather products and prices from web
• MongoDB in production
• Millions of updates per day, 3K/s peak
• Data in SQL, Mongo, ElasticSearch
Concurrency Patterns: Why?
• MongoDB: atomic updates, no transactions
• Need to ensure consistency & correctness
• What are my options with Mongo?
• Shortcuts
• Different approaches
Concurrency Control Strategies
• Pessimistic
• Suited for frequent conflicts
• http://bit.ly/two-phase-commits
• Optimistic
• Efficient when conflicts are rare
• http://bit.ly/isolate-sequence
• Multi-version
• All versions stored, client resolves conflict
• e.g. CouchDb
Optimistic Concurrency Control (OCC)
• No locks
• Prevent dirty writes
• Uses timestamp or a revision number
• Client checks & replays transaction
Example
Original
{ _id: 23, comment: “The quick brown fox…” }
Edit 1
{ _id: 23,
comment: “The quick brown fox prefers SQL” }
Edit 2
{ _id: 23,
comment: “The quick brown fox prefers
MongoDB” }
Example
Edit 1
db.comments.update({ _id: 23 },{ _id: 23,
comment: “The quick brown fox prefers SQL” })
Edit 2
db.comments.update({ _id: 23 },{ _id: 23,
comment: “The quick brown fox prefers MongoDB” })
Outcome: One update is lost, other might be wrong
OCC Example
Original
{ _id: 23, rev: 1,
comment: “The quick brown fox…” }
Update a specific revision (edit 1)
db.comments.update(
{ _id: 23, rev: 1 },
{ _id: 23, rev: 2,
comment: “The quick brown fox prefers SQL” })
OCC Example
Edit 2
db.comments.update(
{ _id: 23, rev: 1 },
{ _id: 23, rev: 2,
comment: “The quick brown fox prefers
MongoDB” })
..fails
{ updatedExisting: false, n: 0,
err: null, ok: 1 }
• Caveat: Only works if all clients follow convention
Update Operators in Mongo
• Avoid full document replacement by using operators
• Powerful operators such as $inc, $set, $push
• Many operators can be grouped into single atomic update
• More efficient (data over wire, parsing, etc.)
• Use as much as possible
• http://bit.ly/update-operators
Still Need OCC?
A hit counter
{ _id: 1, hits: 5040 }
Edit 1
db.stats.update({ _id: 1 },
{ $set: { hits: 5045 } })
Edit 2
db.stats.update({ _id: 1 },
{ $set: { hits: 5055 } })
Still Need OCC?
Edit 1
db.stats.update({ _id: 1 },
{ $inc: { hits: 5 } })
Edit 2
db.stats.update({ _id: 1 },
{ $inc: { hits: 10 } })
• Sequence of updates might vary
• Outcome always the same
• But what if sequence is important?
Still Need OCC?
• Operators can offset need for concurrency control
• Support for complex atomic manipulation
• Depends on use case
• You’ll need it for
• Opaque changes (e.g. text)
• Complex update logic in app domain
(e.g. changing a value affects some calculated fields)
• Sequence is important and can’t be inferred
Update Commands
• Update• Specify query to match one or more documents
• Use { multi: true } to update multiple documents
• Must call Find() separately if you want a copy of the doc
• FindAndModify• Update single document only
• Find + Update in single hit (atomic)
• Returns the doc before or after update
• Whole doc or subset
• Upsert (update or insert)• Important feature. Works with OCC..?
Consistent Update Example
• Have a customer document
• Want to set the LastOrderValue and return the previous value
db.customers.findAndModify({
query: { _id: 16, rev: 45 },
update: {
$set: { lastOrderValue: 699 },
$inc: { rev: 1 }
},
new: false
})
Consistent Update Example
• Customer has since been updated, or doesn’t exist
• Client should replay
null
• Intended version of customer successfully updated
• Original version is returned
{ _id: 16, rev: 45, lastOrderValue: 145 }
• Useful if client has got partial information and needs the
full document
• A separate Find() could introduce inconsistency
Independent Update with Upsert
• Keep stats about customers
• Want to increment NumOrders and return new total
• Customer document might not be there
• Independent operation still needs protection
db.customerStats.findAndModify({query: { _id: 16 },update: {
$inc: { numOrders: 1, rev: 1 },$setOnInsert: { name: “Yann” }
},new: true,upsert: true
})
Independent Update with Upsert
• First run, document is created
{ _id: 16, numOrders: 1, rev: 1, name: “Yann” }
• Second run, document is updated
{ _id: 16, numOrders: 2, rev: 2, name: “Yann” }
Subdocuments
• Common scenario
• e.g. Customer and Orders in single document
• Clients like having everything
• Powerful operators for matching and updating
subdocuments
• $elemMatch, $, $addToSet, $push
• Alternatives to “Fat” documents;
• Client-side joins
• Aggregation
• MapReduce
Currency Control and Subdocuments
• Patterns described here still work, but might be
impractical
• Docs are large
• More collisions
• Solve with scale?
Subdocument Example
• Customer document contains orders
• Want to independently update orders
• Correct order #471 value to £260
{
_id: 16,
rev: 20,
name: “Yann”,
orders: {
“471”: { id: 471, value: 250, rev: 4 }
}
}
Subdocument Example
db.customers.findAndModify({
query: { “orders.471.rev”: { $lte: 4 } },
update: {
$set: { “orders.471.value”: 260 },
$inc: { rev: 1, “orders.471.rev”: 1 },
$setOnInsert: {
name: “Yann”,
“orders.471.id”: 471 }
},
new: true,
upsert: true
})
Subdocument Example
• First run, order updated successfully
• Could create if not exists
{
_id: 16,
rev: 21,
name: “Yann”,
orders: {
“471”: { id: 471, value: 260, rev: 5 }
}
}
Subdocument Example
• Second conflicting run
• Query didn’t match revision, duplicate document created
{
_id: ObjectId("533bf88a50dbb55a8a9b9128"),
rev: 1,
name: “Yann”,
orders: {
“471”: { id: 471, value: 260, rev: 1 }
}
}
Subdocument Example
• Solve with unique index (good idea anyway)
db.customers.ensureIndex(
{ "orders.id" : 1 },
{
"name" : "orderids",
"unique" : true
})
Subdocument Example
Client can handle findAndModify result accordingly;
• Successful update
{ updatedExisting: true }
• New document created
{ updatedExisting: false, n: 1 }
• Conflict, need to replay
{ errmsg: “exception: E11000 duplicate key error index: db.customers.$orderids dup key…” }
Final Words
• Don’t forget deletes
• Gotchas about subdocument structure
orders: [ { id: 471 }, … ]
orders: { “471”: { }, … }
orders: { “471”: { id: 471 }, … }
• Coming in 2.6.0 stable
$setOnInsert: { _id: .. }
• Sharding..?