mongodb berlin schema design

Schema DesignBasic schema modeling in MongoDB

Alvin Richards

Technical Director, EMEAalvin@10gen.com

@jonnyeight

Topics

Schema design is easy!• Data as Objects in code

Common patterns• Single table inheritance• One-to-Many & Many-to-Many• Buckets• Trees• Queues• Inventory

So today’s example will use...

Terminology

RDBMS MongoDB

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking

Partition Shard

Partition Key Shard Key

Schema DesignRelational Database

Schema DesignMongoDB

embedding

Schema DesignMongoDB

embedding

linking

Design Session

Design documents that simply map to your application> post = {author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"]}

> db.posts.save(post)

> db.posts.find()

{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied

Find the document

Secondary index for “author”

// 1 means ascending, -‐1 means descending

> db.posts.ensureIndex({author: 1})

> db.posts.find({author: 'Hergé'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-‐09-‐18T09:56:06.298Z"), author: "Hergé", ... }

Add and index, find via Index

Examine the query plan> db.blogs.find({author: "Hergé"}).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}

Query operators

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...

// find posts with any tags> db.posts.find({tags: {$exists: true}})

Query operators

Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })

Query operators

Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })

Counting: // number of posts written by Hergé> db.posts.find({author: "Hergé"}).count()

Extending the Schema

new_comment = {author: "Kyle", date: new Date(), text: "great book"}

> db.posts.update( {text: "Destination Moon" }, { "$push": {comments: new_comment}, "$inc": {comments_count: 1}})

> db.blogs.find({_id: ObjectId("4c4ba5c0672c685e5e8aabf3")})

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 }

// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})

> db.posts.find({"comments.author":"Kyle"})

// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)

// most commented post:> db.posts.find().sort({comments_count:-‐1}).limit(1)

When sorting, check if you need an index

Use MongoDB with your language10gen Supported Drivers• Ruby, Python, Perl, PHP, Javascript• Java, C/C++, C#, Scala• Erlang, Haskell

Object Data Mappers• Morphia - Java• Mongoid, MongoMapper - Ruby• MongoEngine - Python

Community Drivers• F# , Smalltalk, Clojure, Go, Groovy

Using your schema- using Java Driver// Get a connection to the databaseDBCollection coll = new Mongo().getDB("blogs");

// Create the ObjectMap<String, Object> obj = new HashMap...obj.add("author", "Hergé"); obj.add("text", "Destination Moon");obj.add("date", new Date());

// Insert the object into MongoDBcoll.insert(new BasicDBObject(obj));

Using your schema- using Morphia mapper// Use Morphia annotations@Entityclass Blog { @Id String author; @Indexed Date date; String text;}

Using your schema- using Morphia// Create the data storeDatastore ds = new Morphia().createDatastore()

// Create the ObjectBlog entry = new Blog("Hergé", New Date(), "Destination Moon")

// Insert object into MongoDBds.save(entry);

Common Patterns

Inheritance

shapes tableid type area radius length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - RDBMS

Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}

missing values not stored!

// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

// create index> db.shapes.ensureIndex({radius: 1}, {sparse:true})

index only values present!

One to Many

One to Many relationships can specify• degree of association between objects• containment• life-cycle

One to Many

- Embedded Array - $slice operator to return subset of comments - some queries harder e.g find latest comments across all blogs

blogs: { author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book" } ]}

One to Many

- Normalized (2 collections) - most flexible - more queries

blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), comments: [ {comment : 1)} ]}

comments : { _id : 1, blog: 1000, author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z")}

> blog = db.blogs.find({text: "Destination Moon"});> db.comments.find({blog: blog._id});

Linking versus Embedding

• When should I embed?• When should I link?

Activity Stream - Embedded

// users -‐ one doc per user with all tweets{ _id: "alvin", email: "alvin@10gen.com", tweets: [ { user: "bob", tweet: "20111209-‐1231", text: "Best Tweet Ever!" } ]}

Activity Stream - Linking

// users -‐ one doc per user { _id: "alvin", email: "alvin@10gen.com" }

// tweets -‐ one doc per user per tweet { user: "bob", tweet: "20111209-‐1231", text: "Best Tweet Ever!" }

Embedding

• Great for read performance

• One seek to load entire object

• One roundtrip to database

• Writes can be slow if adding to objects all the time

• Should you embed tweets?

Activity Stream - Buckets// tweets : one doc per user per day

{ _id: "alvin-‐20111209", email: "alvin@10gen.com", tweets: [ { user: "Bob", tweet: "20111209-‐1231", text: "Best Tweet Ever!" } , { author: "Joe", date: "May 27 2011", text: "Stuck in traffic (again)" } ] }

Adding a Tweet

tweet = { user: "Bob", tweet: "20111209-‐1231", text: "Best Tweet Ever!" }

db.tweets.update( { _id : "alvin-‐20111209" }, { $push : { tweets : tweet } );

Deleting a Tweet

db.tweets.update( { _id: "alvin-‐20111209" }, { $pull: { tweets: { tweet: "20111209-‐1231" } })

Many - Many

Example: - Product can be in many categories- Category can have many products

products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }

Many - Many

products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }

categories: { _id: 21, name: "movie", product_ids: [ 10 ] }

Many - Many

products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }

categories: { _id: 21, name: "movie", product_ids: [ 10 ] }

//All categories for a given product> db.categories.find({product_ids: 10})

Many - Many

products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure"}

Alternative

// All products for a given category> db.products.find({category_ids: 20)})

Alternative

// All products for a given category> db.products.find({category_ids: 20)})

// All categories for a given productproduct = db.products.find(_id : some_id)> db.categories.find({_id : {$in : product.category_ids}})

Alternative

Hierarchical information

Full Tree in Document

{ comments: [ { author: “Kyle”, text: “...”, replies: [ {author: “James”, text: “...”, replies: []} ]} ]}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

Array of Ancestors

- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" }

// find all threads where "b" is in

> db.msg_tree.find({thread: "b"})

Array of Ancestors

// find replies to "e"

> db.msg_tree.find({replyTo: "e"})

Array of Ancestors

// find replies to "e"

> db.msg_tree.find({replyTo: "e"})

// find history of "f"> threads = db.msg_tree.findOne( {_id:"f"} ).thread> db.msg_tree.find( { _id: { $in : threads } )

Trees as Paths

Store hierarchy as a path expression- Separate each node by a delimiter, e.g. “/”- Use text search for find parts of a tree

{ comments: [ { author: "Kyle", text: "initial post", path: "" }, { author: "Jim", text: "jim’s comment", path: "jim" }, { author: "Kyle", text: "Kyle’s reply to Jim", path : "jim/kyle"} ] }

// Find the conversations Jim was part of > db.posts.find({path: /^jim/})

• Need to maintain order and state• Ensure that updates are atomic

db.jobs.save( { inprogress: false, priority: 1, ... });

// find highest priority job and mark as in-‐progressjob = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})

• Need to maintain order and state• Ensure that updates are atomic

db.jobs.save( { inprogress: false, priority: 1, ... });

// find highest priority job and mark as in-‐progressjob = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})

{ inprogress: true, priority: 1, started: ISODate("2011-‐09-‐18T09:56:06.298Z") ... }

updated

Inventory

• User has a number of "votes" they can use• A finite stock that you can "sell"• A resource that can be "provisioned"

Inventory

// Number of votes and who user voted for { _id: "alvin", votes: 42, voted_for: [] }

// Subtract a vote and add the blog voted for db.user.update( { _id: "alvin", votes : { $gt : 0}, voted_for: {$ne: "Destination Moon" }, { "$push": {voted_for: "Destination Moon"}, "$inc": {votes: -‐1}})

Summary

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the application manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)

@mongodb

conferences, appearances, and meetupshttp://www.10gen.com/events

http://bit.ly/mongo> Facebook | Twitter | LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

alvin@10gen.com

mongodb berlin schema design

Documents

mongodb training | mongodb online training | mongodb...

mongodb schema design (kyle banker 10gen)

the fine art of schema design in mongodb: dos and don'ts

mongodb - cs.scranton.edubi/2014s-html/se521/mongodb.pdf ·...

midas - on-the-fly schema migration tool for mongodb

welcome to mongodb berlin

cratedb vs. nosql...

xml-schema im detail - matthias-draeger.info€¦ ·...

mongodb: what, why, when. solutions architect, mongodb inc....

rubybarcamp #3 - mongodb high-performance, schema-free,...

mongodb at sailthru: scaling and schema design

mongodb berlin aggregation

mongodb schema design: four real-world examples

mongodb for dba · co je mongodb 10gen -> mongo...

mongodb schema design

mongodb and spring data - meetupfiles.meetup.com/4247302/ric...

mongodb schema design -- inboxes

mongodb europe 2016 - mongodb atlas

torodb: all your mongodb data are belong to sql€¦ ·...

mongodb schema design basics