mongodb schema design: insights and tradeoffs (jetlore's talk at mongosf 2012)
DESCRIPTION
TRANSCRIPT
![Page 1: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/1.jpg)
Montse Medina
COO,
MongoDB Schema Design:
Insights and Tradeoffs
Saturday, May 5, 12
![Page 2: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/2.jpg)
Social content is usefulin context
Saturday, May 5, 12
![Page 3: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/3.jpg)
Social context is useful in context
Saturday, May 5, 12
![Page 4: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/4.jpg)
Algorithms+
Infrastructure
Saturday, May 5, 12
![Page 5: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/5.jpg)
Technology Stack
Apache Kafka
Saturday, May 5, 12
![Page 6: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/6.jpg)
Outline
I. Schema design‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDBSaturday, May 5, 12
![Page 7: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/7.jpg)
I. Schema design‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDB
Outline
Saturday, May 5, 12
![Page 8: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/8.jpg)
vs
Users{ id: 1, name: “Robert”, from:[2], to: [5,20]}
{ id: 2, name:”Monica”, from:[23], to:[1,5]}
...
Users Graphid name
1 Robert2 Monica3 Lucas... ...
from to
1 51 202 12 5... ...
Relational vs. Document-oriented
Saturday, May 5, 12
![Page 9: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/9.jpg)
vsUsers
{ id: 5, name: “Robert”, from:[1,2,4], to: [1,20,3,7,2]}
Graphfrom to
1 51 202 12 53 43 233 124 5... ...
Find all the “to” edges for user 5
Blocks
1 disk seek guaranteed!
Potentially as many
disk seeks as
“to” edges!
Saturday, May 5, 12
![Page 10: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/10.jpg)
Advantages of doc-oriented schema•Avoid joins
•Disk locality when fetching relations (everything is stored within a doc record)
Considerations for schema design•N to Many relations == Lists
•Denormalization is more common
Saturday, May 5, 12
![Page 11: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/11.jpg)
Outline
I. Schema design‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDBSaturday, May 5, 12
![Page 12: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/12.jpg)
Schema-less design{id: 1, network: Twitter, name: “Robert”, from:[2], to: [5,20], screenName: “robertE”}
{id: 2, network: Facebook, name:”Maria”, from:[23], to:[1,5], likes: [“biking”, “hiking”]}...
Leverage the schemaless
nature of Mongo, but put
protection with types in
your code!
Saturday, May 5, 12
![Page 13: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/13.jpg)
Outline
I. Schema design‣ Relational vs. Document-oriented
‣ Schema-less design
‣ Case study: Publishers & Subscribers
II. Lessons learned for schema design
III. Things to remember about MongoDBSaturday, May 5, 12
![Page 14: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/14.jpg)
Read-Friendly
Case Study: Publishers & Subscribers
Saturday, May 5, 12
![Page 15: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/15.jpg)
Read-Friendly Approach
Post: { _id: postId,owner: ownerId,recipient: recipientId,text: “message”, ...}
Hi!
Hi!
Hi!
Saturday, May 5, 12
![Page 16: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/16.jpg)
Read-Friendly Approachdb.posts.find({recipient: uid})
Sharding Key:recipient
Fast retrieval, easy sharding
Slow writes, enormous amount of storage
Saturday, May 5, 12
![Page 17: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/17.jpg)
Write-Friendly
Case Study: Publishers & Subscribers
Saturday, May 5, 12
![Page 18: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/18.jpg)
Write-Friendly Approach
Post: { _id: postId, owner: oId, text: “message”, ...}
Hi!
Saturday, May 5, 12
![Page 19: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/19.jpg)
Write-Friendly Approach
db.posts.find({owner: {$in:user.from}})
Sharding Key:?
Fast writes, slim storage
Slow reads, harder queries
Saturday, May 5, 12
![Page 20: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/20.jpg)
Hybrid Approach
Case Study: Publishers & Subscribers
Saturday, May 5, 12
![Page 21: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/21.jpg)
Hybrid Approach
Hi!
Post: { _id: postId, owner: ownerId, recipients: [u1, u2, u3, u5], text: “message”, ...}
Saturday, May 5, 12
![Page 22: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/22.jpg)
Hybrid Approach
db.posts.find({recipients: uId})
Sharding Key:random :)
Fast writes, slim storage, reasonable read speed
Saturday, May 5, 12
![Page 23: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/23.jpg)
Random sharding is not random!
Minimize the
number of disk
seeks per shard!Best -- Impossible for our data
Worse
Optimal solution
Saturday, May 5, 12
![Page 24: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/24.jpg)
Outline
I. Schema design
II. Lessons learned for schema design‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDBSaturday, May 5, 12
![Page 25: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/25.jpg)
I. Schema design
II. Lessons learned for schema design‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDB
Outline
Saturday, May 5, 12
![Page 26: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/26.jpg)
link: { _id: ObjectId(...), url: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }
link: { _id: “www.jetlore.com”, title: “Jetlore is a search platform for social content”, description: “...” }
IndexesPrimary Key
If your data has a natural
PK, use it instead of the
default ObjectId
Saturday, May 5, 12
![Page 27: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/27.jpg)
Want all posts that a user can view sorted by the number of likes
Indexes Augment your schema to enable the
most selective index
Add a new “likesCount”
field!
db.posts.ensureIndex({recipients: 1,
likesCount: -1})
post: { _id: ObjectId(...), recipients: [...], likes: [...], likesCount: ..., ...}
Saturday, May 5, 12
![Page 28: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/28.jpg)
db.posts.find({recipients: uId}).sort({date: -1})
Indexes Make sure to use the proper index
db.posts.ensureIndex({recipients: 1})db.posts.ensureIndex({date: 1})
vs
db.posts.ensureIndex({recipients: 1, date:1})
date: -1
Always test with
explain()
Saturday, May 5, 12
![Page 29: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/29.jpg)
Outline
I. Schema design
II. Lessons learned for schema design‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDBSaturday, May 5, 12
![Page 30: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/30.jpg)
thread2: { _id: u1, name: “Bob”, from: [] }
db.users.update({_id: u1}, {$set: {_id: u1, name: ..., }}, true, false)
…but!
db.users.update({_id: thread1._id}, {$set: {thread1.from}})
db.users.update({_id: thread2._id}, {$set: {name: thread2.name}})
Concurrency Try to avoid “save()” in drivers
thread1: { _id: u1, name: “Robert”, from: [u2, u3] }
Saturday, May 5, 12
![Page 31: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/31.jpg)
ConcurrencyAtomic Commutative Operators
db.users.update({_id: u1}, {$pull {to: u2}})
db.posts.update({_id: pId}, {$inc: {likesCount: 1}})
When updating lists and counters, instead of using $set, rely on
$inc, $addToSet, $pull
Saturday, May 5, 12
![Page 32: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/32.jpg)
ConcurrencyNo Transactions
user1: { _id: u1, to: [u2, u3], from: [...], ...}
user2: { _id: u2, to: [...], from: [u1, ...], ...}
User1 wants to unsubscribe from user2.
Ideally we would update both users in one transaction
Implement it in your
code
Saturday, May 5, 12
![Page 33: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/33.jpg)
Outline
I. Schema design
II. Lessons learned for schema design‣ Indexes
‣ Concurrency
‣ Reducing collection size
III. Things to remember about MongoDBSaturday, May 5, 12
![Page 34: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/34.jpg)
Reducing collection sizeName your fields with short
names!
post: { owner: ObjectId, messageText: “loving Jetlore”, mediaUrl: “www.jetlore.com”, mediaTitle: “Jetlore is a user analytics & search platform for social content” }
post: { o: ObjectId, t: “loving Jetlore”, mu: “www.jetlore.com”, mt: “Jetlore is a user analytics & search platform for social content” }
vs
Saturday, May 5, 12
![Page 35: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/35.jpg)
OutlineI. Schema design
II. Lessons learned for schema design
III. Things to remember about MongoDB‣ Single lock
‣ ($or + sort) query doesn’t use indexes properly
‣ Indexes with 2 list fields
‣ Record iterators + update
Saturday, May 5, 12
![Page 36: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/36.jpg)
db.posts.find({$or: [{recipients: uId}, {privacy: Public}]}).sort({date: -1})
db.posts.ensureIndex({recipients: 1, date: -1})
db.posts.ensureIndex({privacy: 1, date: -1})
Indexes with 2 list fields
db.posts.ensureIndex({recipients: 1, links: 1}) post: { _id: ObjectId(...), recipients: [...], links: [...], ... }
$or & sort query doesn’t use the proper index
Saturday, May 5, 12
![Page 37: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/37.jpg)
Record iterators + updating
var posts = db.posts.find().skip(n).limit(t)while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}})}
Sort by a field that will not change
db.posts.renameCollection(“oldPosts”)var posts = db.oldPosts.find().skip(n).limit(t)while (posts.hasNext()) { var post = posts.next() db.posts.update({_id: post._id}, {$set: {text: NewText}})}
var posts = db.posts.find().sort({date: 1}).skip(n).limit(t)
Sort by a field that will not change or rename the old collection
Saturday, May 5, 12
![Page 38: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/38.jpg)
The take aways
I. What is more important?
• Writes: Optimize for easy inserts/updates
• Reads: Optimize for easy querying
II. Denormalize to enable the most selective index
III. Concurrency: design to leverage commutative operators
Saturday, May 5, 12
![Page 39: MongoDB Schema Design: Insights and Tradeoffs (Jetlore's talk at MongoSF 2012)](https://reader033.vdocuments.site/reader033/viewer/2022051608/54598af1b1af9f40378b5421/html5/thumbnails/39.jpg)
Thank you!Try our tech
powered by
Saturday, May 5, 12