building real time systems on mongodb using the oplog at stripe

Post on 06-Jul-2015

1.494 Views

Category:

Software

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

MongoDB's oplog is possibly its most underrated feature. The oplog is vital as the basis on which replication is built, but its value doesn't stop there. Unlike the MySQL binlog, which is poorly documented and not directly exposed to MySQL clients, the oplog is a well-documented, structured format for changes that is query-able through the same mechanisms as your data. This allows many types of powerful, application-driven streaming or transformation. At Stripe, we've used the MongoDB oplog to create PostgresSQL, HBase, and ElasticSearch mirrors of our data. We've built a simple real-time trigger mechanism for detecting new data. And we've even used it to recover data. In this talk, we'll show you how we use the MongoDB oplog, and how you can build powerful reactive streaming data applications on top of it. If you'd like to see the presentation with presenter's notes, I've published my Google Docs presentation at https://docs.google.com/presentation/d/19NcoFI9BG7PwLoBV7zvidjs2VLgQWeVVcUd7Xc7NoV0/pub Originally given at MongoDB World 2014 in New York

TRANSCRIPT

MongoDB and the OplogEVAN BRODER @ebroder

AGENDAINTRO TO THE OPLOGEXAMPLE APPLICATIONS

INTROTO THE OPLOG

PRIMARY

SECONDARIES

APPLICATION

APPLICATION

save{_id: 1, a: 2}

THINGS I’VE DONE:

- save {_id: 1, a: 2}

APPLICATION

update where{a: 2},{$set: {a: 3}}

THINGS I’VE DONE:

- save {_id: 1, a: 2}- update {_id: 1}, {$set: {a: 3}}

THINGS I’VE DONE:

- save {_id: 1, a: 2}- update {_id: 1}, {$set: {a: 3}}- insert…- delete…- delete…- save…- update…

THINGS I’VE DONE:

save…

THINGS I’VE DONE:

THINGS I’VE DONE:

save…

THINGS I’VE DONE:

save…

TRIGGERS

GOAL:EVENT PROCESSING

GOAL:DETECT INSERTIONS

WARNINGTHIS CODE IS NOT PRODUCTION-READY

oplog = mongo_connection['local']['oplog.rs']

ns = 'eventdb.events'

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.each do |op|

puts op['o']['_id']

end

end

oplog = mongo_connection['local']['oplog.rs']

ns = 'eventdb.events'

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.each do |op|

puts op['o']['_id']

end

end

oplog = mongo_connection['local']['oplog.rs']

ns = 'eventdb.events'

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.each do |op|

puts op['o']['_id']

end

end

oplog = mongo_connection['local']['oplog.rs']

ns = 'eventdb.events'

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.each do |op|

puts op['o']['_id']

end

end

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

oplog.find({'op' => 'i', 'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

DATA TRANSFORMATIONS

GOAL:MONGODB TO POSTGRESQL

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start},

'op' => 'i',

'ns' => ns}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

start_entry = oplog.find_one({},

{:sort => {'$natural' => -1}})

start = start_entry['ts']

oplog.find({'ts' => {'$gt' => start}}) do |cursor|

cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY)

cursor.add_option(Mongo::Constants::OP_QUERY_TAILABLE)

cursor.add_option(Mongo::Constants::OP_QUERY_AWAIT_DATA)

loop do

cursor.each do |op|

puts op['o']['_id']

end

end

end

cursor.each do |op|

puts op['o']['_id']

end

cursor.each do |op|

case op['op']

when 'i'

puts op['o']['_id']

else

# ¯\_(ツ)_/¯

end

end

cursor.each do |op|

case op['op']

when 'i'

query = "INSERT INTO #{op['ns']} (" +

op['o'].keys.join(', ') +

') VALUES (' +

op['o'].values.map(&:inspect).join(', ') + ')'

else

# ¯\_(ツ)_/¯

end

end

cursor.each do |op|

case op['op']

when 'i'

query = "INSERT INTO #{op['ns']} (" +

op['o'].keys.join(', ') +

') VALUES (' +

op['o'].values.map(&:inspect).join(', ') + ')'

when 'd'

query = "DELETE FROM #{op['ns']} WHERE _id=" +

op['o']['_id'].inspect

else

# ¯\_(ツ)_/¯

end

end

query = "DELETE FROM #{op['ns']} WHERE _id=" +

op['o']['_id'].inspect

when 'u'

query = "UPDATE #{op['ns']} SET"

updates = op['o']['$set'] ? op['o']['$set'] : op['o']

updates.each do |k, v|

query += " #{k}=#{v.inspect}"

end

query += " WHERE _id="

query += op['o2']['_id'].inspect

else

# ¯\_(ツ)_/¯

end

end

cursor.each do |op|

case op['op']

when 'i'

query = "INSERT INTO #{op['ns']} (" +

op['o'].keys.join(', ') + ') VALUES (' +

op['o'].values.map(&:inspect).join(', ') + ')'

when 'd'

query = "DELETE FROM #{op['ns']} WHERE _id=" +

op['o']['_id'].inspect

when 'u'

query = "UPDATE #{op['ns']} SET"

updates = op['o']['$set'] ? op['o']['$set'] : op['o']

updates.each do |k, v|

query += " #{k}=#{v.inspect}"

end

query += " WHERE _id=" + op['o2']['_id'].inspect

else

# ¯\_(ツ)_/¯

end

end

github.com/stripe/mosql

github.com/stripe/zerowing

cursor.each do |op|

case op['op']

when 'i'

query = "INSERT INTO #{op['ns']} (" +

op['o'].keys.join(', ') + ') VALUES (' +

op['o'].values.map(&:inspect).join(', ') + ')'

when 'd'

query = "DELETE FROM #{op['ns']} WHERE _id=" +

op['o']['_id'].inspect

when 'u'

query = "UPDATE #{op['ns']} SET"

updates = op['o']['$set'] ? op['o']['$set'] : op['o']

updates.each do |k, v|

query += " #{k}=#{v.inspect}"

end

query += " WHERE _id=" + op['o2']['_id'].inspect

else

# ¯\_(ツ)_/¯

end

end

DISASTER RECOVERY

task = collection.find_one({'finished' => nil}

# do something with task…

collection.update({'_id' => task.id},

{'$set' => {'finished' => Time.now.to_i}})

loop do

collection.remove(

{'finished' => {'$lt' => Time.now.to_i - 30}})

sleep(10)

end

evan@caron:~$ mongo

MongoDB shell version: 2.4.10

connecting to: test

normal:PRIMARY> null < (Date.now() / 1000) - 30

true

THINGS I’VE DONE:

insertdelete…

THINGS I’VE DONE:

> db.getReplicationInfo()

{

"logSizeMB" : 48964.3541015625,

"usedMB" : 46116.4,

"timeDiff" : 316550,

"timeDiffHours" : 87.93,

"tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)",

"tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)",

"now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)"

}

> db.getReplicationInfo()

{

"logSizeMB" : 48964.3541015625,

"usedMB" : 46116.4,

"timeDiff" : 316550,

"timeDiffHours" : 87.93,

"tFirst" : "Thu Apr 11 2013 07:24:29 GMT+0000 (UTC)",

"tLast" : "Sun Apr 14 2013 23:20:19 GMT+0000 (UTC)",

"now" : "Sat May 24 2014 07:52:35 GMT+0000 (UTC)"

}

new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == nil # found one! # save old_task to a file, and we'll re-queue it later end end

old_connection['admin'].command({'applyOps' => [op]}) endend

new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == false # found one! # save old_task to a file, and we'll re-queue it later end end

old_connection['admin'].command({'applyOps' => [op]}) endend

new_oplog.find({'ts' => {'$gt' => start}}) do |cursor| cursor.add_option(Mongo::Constants::OP_QUERY_OPLOG_REPLAY) cursor.each do |op| if op['op'] == 'd' && op['ns'] == 'monsterdb.tasks' old_task = old_tasks.find_one({'_id' => op['o']['_id']}) if old_task['finished'] == false # found one! # save old_task to a file, and we'll re-queue it later end end

old_connection['admin'].command({'applyOps' => [op]}) endend

THINGS I’VE DONE:

save…

THINGS I’VE DONE:

save…

QUESTIONS?

top related