the rough guide to mongodb

47
The Rough Guide to MongoDB Simeon Simeonov @simeons

Upload: simeon-simeonov

Post on 28-Oct-2014

978 views

Category:

Technology


3 download

DESCRIPTION

Simeon Simeonov, Founder & CTO of Swoop, shares how Swoop uses Mongo behind the scenes for their high-performance core data processing and analytics. The presentation goes over tips and tricks such as zero-overhead hierarchical relationships with MongoDB, high-performance MongoDB atomic update buffering, content-addressed storage using cryptographic hashing and more. Presented to the Boston MongoDB User Group.

TRANSCRIPT

Page 1: The Rough Guide to MongoDB

The Rough Guide to MongoDB

Simeon Simeonov@simeons

Page 2: The Rough Guide to MongoDB

Founding. Funding.

Growing. Startups.

Page 3: The Rough Guide to MongoDB

Why MongoDB?

Page 4: The Rough Guide to MongoDB

I am @simeons

Page 5: The Rough Guide to MongoDB

recruit amazing people

solve hard problems

ship

make users happy

repeat

Page 6: The Rough Guide to MongoDB
Page 7: The Rough Guide to MongoDB

Why MongoDB?

Again, please

Page 8: The Rough Guide to MongoDB

SQL is slow(for our business)

Page 9: The Rough Guide to MongoDB

SQL is slow(for our developer workflow)

Page 10: The Rough Guide to MongoDB

SQL is slow(for our analytics system)

Page 11: The Rough Guide to MongoDB

So what’s Swoop?

Page 12: The Rough Guide to MongoDB
Page 13: The Rough Guide to MongoDB

Display AdvertisingMakes the Web Suck

User-focused optimizationTens of millions of users

1000+% better than average200+% better than Google

Swoop Fixes That

Page 14: The Rough Guide to MongoDB

Mobile SDKsiOS & Android

Web SDKRequireJS & jQuery

ComponentsAngularJS

NLP, etc.Python

TargetingHigh-Perf Java

AnalyticsRuby 2.0

Internal AppsRuby 2.0 / Rails 3

Pub PortalRuby 2.0 / Rails 3

Ad PortalRuby 2.0 / Rails 4

Page 15: The Rough Guide to MongoDB

MongoDB: the GoodFast

Flexible

JavaScript

Page 16: The Rough Guide to MongoDB

MongoDB: the BadNot Quite Enterprise-Grade

Not Quite Enterprise-Grade

Not Cheap to Run Well

Page 17: The Rough Guide to MongoDB

I will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust code

Page 18: The Rough Guide to MongoDB

I will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduce

Page 19: The Rough Guide to MongoDB

RAM + locks == $$$

Page 20: The Rough Guide to MongoDB
Page 21: The Rough Guide to MongoDB

Five Steps to HappinessSharding

Native Relationships

Atomic Update Buffering

Content-Addressed Storage

Shell Tricks

Page 22: The Rough Guide to MongoDB
Page 23: The Rough Guide to MongoDB
Page 24: The Rough Guide to MongoDB

// Google AdWords object modelAccount Campaign AdGroup // this joins ads & keywords Ad Keyword

// For exampleAdGroup has an AccountAdGroup has a CampaignAdGroup has many AdsAdGroup has many Keywords

Slam dunk for SQL

Page 25: The Rough Guide to MongoDB

// Let’s play a bitAccount Campaign AdGroup Ad Keyword

Page 26: The Rough Guide to MongoDB

// Let’s play some moreAccount Campaign AdGroup Ad Keyword

Page 27: The Rough Guide to MongoDB

// There is just one bit leftAccount Campaign AdGroup 1 Ad 0 Keyword

Page 28: The Rough Guide to MongoDB

// build a hierarchical IDaccountIDcampaignIDadGroupID((0keywordID)|(1adID))

// a binary ID!10100100001100000000101001100110101010010100< accountID >< campaignID >< …

// Encode it in base 16, 32 or 64{"_id" : "a4300a66a94d20f1", … }

Page 29: The Rough Guide to MongoDB

// Example

The 5th adOf the 3rd ad groupOf the 7th campaignOf the 255th account

could have the _id 0x00ff000700031005

The _id for the 10th keyword of the same ad group would be 0x00ff00070003000a

Page 30: The Rough Guide to MongoDB

// Neat: the ad’s and keyword’s _ids contain the// IDs of all of their ancestors in the hierarchy.

keywordId = 0x00ff00070003000a

adGroupId = keywordId & 0xffffffffffff0000campaignId = keywordId & 0xffffffff00000000accountId = keywordId & 0xffff000000000000

// has-a relationship is a simple lookupaccount = db.accounts.findOne({_id: accountId})

Page 31: The Rough Guide to MongoDB

// Neater: has-many relationships are just// range queries on the _id index.

adGroupId = keywordId & 0xffffffffffff0000startOfAds = adGroupId + 0x1000 endOfAds = adGroupId + 0x1fff

adsForKeyword = db.ads.find({ _id: {$gte: startOfAds, $lte: endOfAds}})

// Technically, that was a join via the ad group.// Who said Mongo can’t do joins???

Page 32: The Rough Guide to MongoDB
Page 33: The Rough Guide to MongoDB
Page 34: The Rough Guide to MongoDB
Page 35: The Rough Guide to MongoDB
Page 36: The Rough Guide to MongoDB

> db.reports.findOne(){ "_id" : …, "period" : "hour", "shard" : 0, // 16Mb doc limit protection "topic" : "ce-1", "ts" : ISODate("2012-06-12T05:00:00Z"), "variations" : { "2" : { // variationID (dimension set) "hint" : { "present" : 311, // hint.present is a metric "clicks" : 1 } }, "4" : { "hint" : { "present" : 331 } } }}

Page 37: The Rough Guide to MongoDB

Content Addressed StorageLazy join abstraction

Very space efficient

Extremely (pre-)cacheable

Join only happens during reporting

Page 38: The Rough Guide to MongoDB

// Step 1: take a set of dimensions worth trackingdata = {

"domain_id" : "SW-28077508-16444","hint" : "Find an organic alternative","theme" : "red"

}

// Step 2: compute a digital signature, e.g., MD5sig = "000069569F4835D16E69DF704187AC2F”

// Step 3: if new sig, increment a countercounter = 264034

// Step 4: create a document in the context-// addressed store collection for these

Page 39: The Rough Guide to MongoDB

> db.cas.findOne(){

"_id" : "000069569F4835D16E69DF704187AC2F", // MD5 hash"data" : { // data that was digested to the hash above

"domain_id" : "SW-28077508-16444","hint" : "Find an organic alternative",

"theme” : "red"},"meta_data" : {

"id" : 264034 // variationID},"created_at" : ISODate("2013-02-04T12:05:34.752Z")

}

// Elsewhere, in the reports collection…

"variations" : { "264034" : { // metrics here }, …

lazy join

Page 40: The Rough Guide to MongoDB
Page 41: The Rough Guide to MongoDB

// Use underscore.js in the shell// See http://underscorejs.org/function underscore() {

load("/mongo_hacks/underscore.js");}

Page 42: The Rough Guide to MongoDB

// Loads underscore.js on the MongoDB serverfunction server_underscore(force) { force = force || false; if (force || typeof(underscoreLoaded) === 'undefined') {

db.eval(cat("/mongo_hacks/underscore.js")); underscoreLoaded = true;

}}

Page 43: The Rough Guide to MongoDB

// Callstack printing on exception -- wraps a functionfunction dbg(f) { try { f(); } catch (e) { print("\n**** Exception: " + e.toString()); print("\n"); print(e.stack); print("\n"); if (arguments.length > 1) { printjson(arguments); print("\n"); } throw e; }}

Page 44: The Rough Guide to MongoDB

function minutesAgo(minutes, d) { d = d || new Date(); return new Date(d.valueOf() - minutes * 60 * 1000);}

function hoursAgo(hours, d) { d = d || new Date(); return minutesAgo(60 * hours, d);}

function daysAgo(days, d) { d = d || new Date(); return hoursAgo(24 * days, d);}

Page 45: The Rough Guide to MongoDB

// Don’t write in the shell.// Use your fav editor, save & type t() in mongofunction t() { load("/mongo_hacks/bag_of_tricks.js");}

Page 46: The Rough Guide to MongoDB
Page 47: The Rough Guide to MongoDB

@simeons

[email protected]