mongodb berlin schema design
Post on 21-Apr-2015
394 Views
Preview:
DESCRIPTION
TRANSCRIPT
Schema DesignBasic schema modeling in MongoDB
Alvin Richards
Technical Director, EMEAalvin@10gen.com
@jonnyeight
Topics
Schema design is easy!• Data as Objects in code
Common patterns• Single table inheritance• One-to-Many & Many-to-Many• Buckets• Trees• Queues• Inventory
So today’s example will use...
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key
Schema DesignRelational Database
Schema DesignMongoDB
Schema DesignMongoDB
embedding
Schema DesignMongoDB
embedding
linking
Design Session
Design documents that simply map to your application> post = {author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"]}
> db.posts.save(post)
> db.posts.find()
{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied
Find the document
Secondary index for “author”
// 1 means ascending, -‐1 means descending
> db.posts.ensureIndex({author: 1})
> db.posts.find({author: 'Hergé'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-‐09-‐18T09:56:06.298Z"), author: "Hergé", ... }
Add and index, find via Index
Examine the query plan> db.blogs.find({author: "Hergé"}).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}
Examine the query plan> db.blogs.find({author: "Hergé"}).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}
Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...
// find posts with any tags> db.posts.find({tags: {$exists: true}})
Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...
// find posts with any tags> db.posts.find({tags: {$exists: true}})
Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })
Query operators
Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...
// find posts with any tags> db.posts.find({tags: {$exists: true}})
Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })
Counting: // number of posts written by Hergé> db.posts.find({author: "Hergé"}).count()
Extending the Schema
new_comment = {author: "Kyle", date: new Date(), text: "great book"}
> db.posts.update( {text: "Destination Moon" }, { "$push": {comments: new_comment}, "$inc": {comments_count: 1}})
> db.blogs.find({_id: ObjectId("4c4ba5c0672c685e5e8aabf3")})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 }
Extending the Schema
// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})
> db.posts.find({"comments.author":"Kyle"})
Extending the Schema
// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})
> db.posts.find({"comments.author":"Kyle"})
// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)
Extending the Schema
// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})
> db.posts.find({"comments.author":"Kyle"})
// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)
// most commented post:> db.posts.find().sort({comments_count:-‐1}).limit(1)
When sorting, check if you need an index
Extending the Schema
Use MongoDB with your language10gen Supported Drivers• Ruby, Python, Perl, PHP, Javascript• Java, C/C++, C#, Scala• Erlang, Haskell
Object Data Mappers• Morphia - Java• Mongoid, MongoMapper - Ruby• MongoEngine - Python
Community Drivers• F# , Smalltalk, Clojure, Go, Groovy
Using your schema- using Java Driver// Get a connection to the databaseDBCollection coll = new Mongo().getDB("blogs");
// Create the ObjectMap<String, Object> obj = new HashMap...obj.add("author", "Hergé"); obj.add("text", "Destination Moon");obj.add("date", new Date());
// Insert the object into MongoDBcoll.insert(new BasicDBObject(obj));
Using your schema- using Morphia mapper// Use Morphia annotations@Entityclass Blog { @Id String author; @Indexed Date date; String text;}
Using your schema- using Morphia// Create the data storeDatastore ds = new Morphia().createDatastore()
// Create the ObjectBlog entry = new Blog("Hergé", New Date(), "Destination Moon")
// Insert object into MongoDBds.save(entry);
Common Patterns
Inheritance
shapes tableid type area radius length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
Single Table Inheritance - RDBMS
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
missing values not stored!
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})
Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}
// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})
// create index> db.shapes.ensureIndex({radius: 1}, {sparse:true})
index only values present!
One to Many
One to Many relationships can specify• degree of association between objects• containment• life-cycle
One to Many
- Embedded Array - $slice operator to return subset of comments - some queries harder e.g find latest comments across all blogs
blogs: { author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book" } ]}
One to Many
- Normalized (2 collections) - most flexible - more queries
blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), comments: [ {comment : 1)} ]}
comments : { _id : 1, blog: 1000, author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z")}
> blog = db.blogs.find({text: "Destination Moon"});> db.comments.find({blog: blog._id});
Linking versus Embedding
• When should I embed?• When should I link?
Activity Stream - Embedded
// users -‐ one doc per user with all tweets{ _id: "alvin", email: "alvin@10gen.com", tweets: [ { user: "bob", tweet: "20111209-‐1231", text: "Best Tweet Ever!" } ]}
Activity Stream - Linking
// users -‐ one doc per user { _id: "alvin", email: "alvin@10gen.com" }
// tweets -‐ one doc per user per tweet { user: "bob", tweet: "20111209-‐1231", text: "Best Tweet Ever!" }
Embedding
• Great for read performance
• One seek to load entire object
• One roundtrip to database
• Writes can be slow if adding to objects all the time
• Should you embed tweets?
Activity Stream - Buckets// tweets : one doc per user per day
{ _id: "alvin-‐20111209", email: "alvin@10gen.com", tweets: [ { user: "Bob", tweet: "20111209-‐1231", text: "Best Tweet Ever!" } , { author: "Joe", date: "May 27 2011", text: "Stuck in traffic (again)" } ] }
Adding a Tweet
tweet = { user: "Bob", tweet: "20111209-‐1231", text: "Best Tweet Ever!" }
db.tweets.update( { _id : "alvin-‐20111209" }, { $push : { tweets : tweet } );
Deleting a Tweet
db.tweets.update( { _id: "alvin-‐20111209" }, { $pull: { tweets: { tweet: "20111209-‐1231" } })
Many - Many
Example: - Product can be in many categories- Category can have many products
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }
Many - Many
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }
categories: { _id: 21, name: "movie", product_ids: [ 10 ] }
Many - Many
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }
categories: { _id: 21, name: "movie", product_ids: [ 10 ] }
//All categories for a given product> db.categories.find({product_ids: 10})
Many - Many
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure"}
Alternative
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure"}
// All products for a given category> db.products.find({category_ids: 20)})
Alternative
products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure"}
// All products for a given category> db.products.find({category_ids: 20)})
// All categories for a given productproduct = db.products.find(_id : some_id)> db.categories.find({_id : {$in : product.category_ids}})
Alternative
Trees
Hierarchical information
Trees
Full Tree in Document
{ comments: [ { author: “Kyle”, text: “...”, replies: [ {author: “James”, text: “...”, replies: []} ]} ]}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 16MB limit
Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where "b" is in
> db.msg_tree.find({thread: "b"})
A B C
DE
F
Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where "b" is in
> db.msg_tree.find({thread: "b"})
// find replies to "e"
> db.msg_tree.find({replyTo: "e"})
A B C
DE
F
Array of Ancestors
- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" }
// find all threads where "b" is in
> db.msg_tree.find({thread: "b"})
// find replies to "e"
> db.msg_tree.find({replyTo: "e"})
// find history of "f"> threads = db.msg_tree.findOne( {_id:"f"} ).thread> db.msg_tree.find( { _id: { $in : threads } )
A B C
DE
F
Trees as Paths
Store hierarchy as a path expression- Separate each node by a delimiter, e.g. “/”- Use text search for find parts of a tree
{ comments: [ { author: "Kyle", text: "initial post", path: "" }, { author: "Jim", text: "jim’s comment", path: "jim" }, { author: "Kyle", text: "Kyle’s reply to Jim", path : "jim/kyle"} ] }
// Find the conversations Jim was part of > db.posts.find({path: /^jim/})
Queue
• Need to maintain order and state• Ensure that updates are atomic
db.jobs.save( { inprogress: false, priority: 1, ... });
// find highest priority job and mark as in-‐progressjob = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})
Queue
• Need to maintain order and state• Ensure that updates are atomic
db.jobs.save( { inprogress: false, priority: 1, ... });
// find highest priority job and mark as in-‐progressjob = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})
Queue
{ inprogress: true, priority: 1, started: ISODate("2011-‐09-‐18T09:56:06.298Z") ... }
updated
added
Inventory
• User has a number of "votes" they can use• A finite stock that you can "sell"• A resource that can be "provisioned"
Inventory
// Number of votes and who user voted for { _id: "alvin", votes: 42, voted_for: [] }
// Subtract a vote and add the blog voted for db.user.update( { _id: "alvin", votes : { $gt : 0}, voted_for: {$ne: "Destination Moon" }, { "$push": {voted_for: "Destination Moon"}, "$inc": {votes: -‐1}})
Summary
Schema design is different in MongoDB
Basic data design principals stay the same
Focus on how the application manipulates data
Rapidly evolve schema to meet your requirements
Enjoy your new freedom, use it wisely :-)
@mongodb
conferences, appearances, and meetupshttp://www.10gen.com/events
http://bit.ly/mongo> Facebook | Twitter | LinkedIn
http://linkd.in/joinmongo
download at mongodb.org
alvin@10gen.com
top related