queue in the cloud with mongo db

QUEUE IN THE CLOUD WITH MONGODBMONGODB LA 2013

NURI HALPERIN

USAGE

Ordered execution

Buffering consumer/producer

Work distribution

GOALS OF PROJECT

Leverage Mongo

• Reduce ops overhead by reusing infrastructure• Map queue semantics to Mongo’s strengths

Reliable

• Durable - support long running process• Resilient to machine failure• Narrow down window of failure/ data loss.

Centralized, distributed:

• Multiple producers• Multiple consumers

ITERATION 0

Capped collection – not the perfect choice

• Tailing queue seems attractive, but…• Need external sync to avoid double-consume• Secondary indexes and updating are anti-pattern

Relaxing FIFO is OK

• No guarantee that first-popped is first done• Multi-client is negated if they have to sync on execution order• Race condition for queue insertion has same effect

Conclusion: Project doesn’t use capped collection and relaxes FIFO.

PARANOID BY DESIGN

Network diesProcess dies

DB dies

Machine dies Poison letter Dead letter

ITERATION 1

db.q4foo.save({v:{f:1}})

db.q4foo.findAndModify({query: {}, sort: {_id:1}, remove: true})

Hot: quick and simple

Not: dead client, dead in transit, no trace

ARE WE THERE YET?


DB dies


QUEUE SEMANTICSLocal / Memory Distributed

Push Put

Pop Get << visibility >>

<< exception >> Release << retry >>

Delete

<< exception >>

ITERATION 2db.q4foo.save({v:{f:1}, dq: null})

db.q4foo.findAndModify( { query: { dq: null}, sort: {_id:1}, update:{ $set: { dq: later(60)}}})

… If processing was success => delete..

Hot: If client dies, item remains in queue. Data not lost.

Not: index on _id less useful in high volume.

ARE WE THERE YET?


DB dies


ITERATION 3db.q4foo.save({v:{f:1}, dq: null, pc: 0})

db.q4foo.findAndModify({query: { dq: null, pc:{$lt:3}}, sort: {_id:1}, update:{$set:{dq:later(60)},$inc:{pc:1}}}) // consume

db.q4foo.findAndModify({query: {_id:"..."}, update:{$set:{dq: null}}}) // release

Hot: An item can be retried automatically (pc) after released. Exhausted item remains in queue.

Not: Not strict FIFO.

ARE WE THERE? YES.


DB dies


ITERATION 4

Ensure your queue writes use applicable durability

• db.q4foo.save() + getLastError(…)• db.q4foo.findAndModify () + getLastError(…)

Replica sets for durability only. No capacity or speed gain.

OTHER THOUGHTSCreate admin jobs to monitor queues:

• Growth• Retries exhausted

Consider TTL risks (ex: client failure before calling Release())

Consider idempotent operations when possible

Design clients to back off polling

Separate queue vs. extra “topic” field

Consider dedicated DB for write-lock scope

Capped vs. regular collection – capped now can have _id, in-place update.

Q&A

Nuri Halperin

[email protected]

Thank you!

queue in the cloud with mongo db

Technology

getlasterror db

queue insertion

pollingseparate queue

dead client

use applicable durability

client failure

perfect choice tailing

machine failure narrow