the rough guide to mongodb

Post on 28-Oct-2014

978 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Simeon Simeonov, Founder & CTO of Swoop, shares how Swoop uses Mongo behind the scenes for their high-performance core data processing and analytics. The presentation goes over tips and tricks such as zero-overhead hierarchical relationships with MongoDB, high-performance MongoDB atomic update buffering, content-addressed storage using cryptographic hashing and more. Presented to the Boston MongoDB User Group.

TRANSCRIPT

The Rough Guide to MongoDB

Simeon Simeonov@simeons

Founding. Funding.

Growing. Startups.

Why MongoDB?

I am @simeons

recruit amazing people

solve hard problems

ship

make users happy

repeat

Why MongoDB?

Again, please

SQL is slow(for our business)

SQL is slow(for our developer workflow)

SQL is slow(for our analytics system)

So what’s Swoop?

Display AdvertisingMakes the Web Suck

User-focused optimizationTens of millions of users

1000+% better than average200+% better than Google

Swoop Fixes That

Mobile SDKsiOS & Android

Web SDKRequireJS & jQuery

ComponentsAngularJS

NLP, etc.Python

TargetingHigh-Perf Java

AnalyticsRuby 2.0

Internal AppsRuby 2.0 / Rails 3

Pub PortalRuby 2.0 / Rails 3

Ad PortalRuby 2.0 / Rails 4

MongoDB: the GoodFast

Flexible

JavaScript

MongoDB: the BadNot Quite Enterprise-Grade

Not Quite Enterprise-Grade

Not Cheap to Run Well

I will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust codeI will write more robust code

I will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduceI will design a better map-reduce

RAM + locks == $$$

Five Steps to HappinessSharding

Native Relationships

Atomic Update Buffering

Content-Addressed Storage

Shell Tricks

// Google AdWords object modelAccount Campaign AdGroup // this joins ads & keywords Ad Keyword

// For exampleAdGroup has an AccountAdGroup has a CampaignAdGroup has many AdsAdGroup has many Keywords

Slam dunk for SQL

// Let’s play a bitAccount Campaign AdGroup Ad Keyword

// Let’s play some moreAccount Campaign AdGroup Ad Keyword

// There is just one bit leftAccount Campaign AdGroup 1 Ad 0 Keyword

// build a hierarchical IDaccountIDcampaignIDadGroupID((0keywordID)|(1adID))

// a binary ID!10100100001100000000101001100110101010010100< accountID >< campaignID >< …

// Encode it in base 16, 32 or 64{"_id" : "a4300a66a94d20f1", … }

// Example

The 5th adOf the 3rd ad groupOf the 7th campaignOf the 255th account

could have the _id 0x00ff000700031005

The _id for the 10th keyword of the same ad group would be 0x00ff00070003000a

// Neat: the ad’s and keyword’s _ids contain the// IDs of all of their ancestors in the hierarchy.

keywordId = 0x00ff00070003000a

adGroupId = keywordId & 0xffffffffffff0000campaignId = keywordId & 0xffffffff00000000accountId = keywordId & 0xffff000000000000

// has-a relationship is a simple lookupaccount = db.accounts.findOne({_id: accountId})

// Neater: has-many relationships are just// range queries on the _id index.

adGroupId = keywordId & 0xffffffffffff0000startOfAds = adGroupId + 0x1000 endOfAds = adGroupId + 0x1fff

adsForKeyword = db.ads.find({ _id: {$gte: startOfAds, $lte: endOfAds}})

// Technically, that was a join via the ad group.// Who said Mongo can’t do joins???

> db.reports.findOne(){ "_id" : …, "period" : "hour", "shard" : 0, // 16Mb doc limit protection "topic" : "ce-1", "ts" : ISODate("2012-06-12T05:00:00Z"), "variations" : { "2" : { // variationID (dimension set) "hint" : { "present" : 311, // hint.present is a metric "clicks" : 1 } }, "4" : { "hint" : { "present" : 331 } } }}

Content Addressed StorageLazy join abstraction

Very space efficient

Extremely (pre-)cacheable

Join only happens during reporting

// Step 1: take a set of dimensions worth trackingdata = {

"domain_id" : "SW-28077508-16444","hint" : "Find an organic alternative","theme" : "red"

}

// Step 2: compute a digital signature, e.g., MD5sig = "000069569F4835D16E69DF704187AC2F”

// Step 3: if new sig, increment a countercounter = 264034

// Step 4: create a document in the context-// addressed store collection for these

> db.cas.findOne(){

"_id" : "000069569F4835D16E69DF704187AC2F", // MD5 hash"data" : { // data that was digested to the hash above

"domain_id" : "SW-28077508-16444","hint" : "Find an organic alternative",

"theme” : "red"},"meta_data" : {

"id" : 264034 // variationID},"created_at" : ISODate("2013-02-04T12:05:34.752Z")

}

// Elsewhere, in the reports collection…

"variations" : { "264034" : { // metrics here }, …

lazy join

// Use underscore.js in the shell// See http://underscorejs.org/function underscore() {

load("/mongo_hacks/underscore.js");}

// Loads underscore.js on the MongoDB serverfunction server_underscore(force) { force = force || false; if (force || typeof(underscoreLoaded) === 'undefined') {

db.eval(cat("/mongo_hacks/underscore.js")); underscoreLoaded = true;

}}

// Callstack printing on exception -- wraps a functionfunction dbg(f) { try { f(); } catch (e) { print("\n**** Exception: " + e.toString()); print("\n"); print(e.stack); print("\n"); if (arguments.length > 1) { printjson(arguments); print("\n"); } throw e; }}

function minutesAgo(minutes, d) { d = d || new Date(); return new Date(d.valueOf() - minutes * 60 * 1000);}

function hoursAgo(hours, d) { d = d || new Date(); return minutesAgo(60 * hours, d);}

function daysAgo(days, d) { d = d || new Date(); return hoursAgo(24 * days, d);}

// Don’t write in the shell.// Use your fav editor, save & type t() in mongofunction t() { load("/mongo_hacks/bag_of_tricks.js");}

@simeons

sim@swoop.com

top related