pmug schema design and scaling
TRANSCRIPT
-
7/31/2019 PMUG Schema Design and Scaling
1/93
MongoDBSchema Design & Scaling
Alvin Richards
Technical Director, [email protected]
@jonnyeight
-
7/31/2019 PMUG Schema Design and Scaling
2/93
Agenda
18:00 - 18:15 : Why MongoDB 18:15 - 18:45 : Schema Design 18:45 - 19:00 : Break 19:00 - 19:45 : Scaling 19:45 - 20:00 : Q & A 20:00 - 22:00 : After Party!
-
7/31/2019 PMUG Schema Design and Scaling
3/93
-
7/31/2019 PMUG Schema Design and Scaling
4/93
10gen - Company Prole
Company behind MongoDB
aGPL license, own copyrights, engineering team
support, consulting, training, license revenue
Funding $73.5 million total funding Sequoia, Union Square, Flybridge, NEA
Management team Google/DoubleClick, Oracle, Apple, NetApp
NYC, Palo Alto, London, Dublin & Sydney
110+ employees
-
7/31/2019 PMUG Schema Design and Scaling
5/93
Todays challenges
-
7/31/2019 PMUG Schema Design and Scaling
6/93
Current technology stack addssignicant complexity
complexity
caching
customsharding
verticalscaling
-
7/31/2019 PMUG Schema Design and Scaling
7/93
Current technology stackreduces productivity
productivity
denormalize
removejoins
removetransactions
-
7/31/2019 PMUG Schema Design and Scaling
8/93
Why we exist
-
7/31/2019 PMUG Schema Design and Scaling
9/93
More than 500 customersworldwide
Archiving Complex DataFlexible Data
eCommerceContent and DocumentManagement, MultiMedia
Finance Gaming Infrastructure
Operational Datastore for Web Infrastructure
Real-time Analytics Media Mobile
-
7/31/2019 PMUG Schema Design and Scaling
10/93
NoSQL Market Leadership
-
7/31/2019 PMUG Schema Design and Scaling
11/93
Part 2 - Schema Design
-
7/31/2019 PMUG Schema Design and Scaling
12/93
Topics
Schema design is easy! Data as Objects in code
Common patterns Single table inheritance One-to-Many & Many-to-Many Buckets Trees Queues Inventory
-
7/31/2019 PMUG Schema Design and Scaling
13/93
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key
-
7/31/2019 PMUG Schema Design and Scaling
14/93
Schema DesignRelational Database
-
7/31/2019 PMUG Schema Design and Scaling
15/93
Schema DesignMongoDB
-
7/31/2019 PMUG Schema Design and Scaling
16/93
Schema DesignMongoDB embedding
-
7/31/2019 PMUG Schema Design and Scaling
17/93
Schema DesignMongoDB
linking
embedding
-
7/31/2019 PMUG Schema Design and Scaling
18/93
So todays example will use...
-
7/31/2019 PMUG Schema Design and Scaling
19/93
Design Session
Design documents that simply map toyour application
> post = { author : "Herg",date : ISODate("2011-09-18T09:56:06.298Z"),text : "Destination Moon",tags : ["comic", "adventure"]}
> db.posts.insert(post)
-
7/31/2019 PMUG Schema Design and Scaling
20/93
> db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),author : "Herg",date : ISODate("2011-09-18T09:56:06.298Z"),text : "Destination Moon",tags : [ "comic", "adventure" ]
}
Notes: ID must be unique, but can be anything youd like MongoDB will generate a default ID if one is notsupplied
Find the document
-
7/31/2019 PMUG Schema Design and Scaling
21/93
Secondary index for author
// 1 means ascending, -1 means descending
> db.posts.ensureIndex({ author : 1})
> db.posts.find({ author : 'Herg'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),date : ISODate("2011-09-18T09:56:06.298Z"),author : "Herg",
... }
Add and index, nd via Index
-
7/31/2019 PMUG Schema Design and Scaling
22/93
Query operators
Conditional operators:$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..$lt, $lte, $gt, $gte, $ne...
// find posts with any tags> db.posts.find({ tags : { $exists : true}})
Regular expressions:// posts where author starts with h
> db.posts.find({ author : /^h/i })
Counting:// number of posts written by Herg> db.posts.find({ author : "Herg"}).count()
-
7/31/2019 PMUG Schema Design and Scaling
23/93
Extending the schema
http://nysi.org.uk/kids_stu f /rocket/rocket.htm
-
7/31/2019 PMUG Schema Design and Scaling
24/93
Extending the Schema
new_comment = { author : "Kyle",date : new Date(),text : "great book"}
> db.posts.update({ text : "Destination Moon" },
{ " $push ": { comments : new_comment }," $inc ": { comments_count : 1}})
-
7/31/2019 PMUG Schema Design and Scaling
25/93
> db.posts.find({_id: ObjectId("4c4ba5c0672c685e5e8aabf3") })
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),author : "Herg",date : ISODate("2011-09-18T09:56:06.298Z"),text : "Destination Moon",tags : [ "comic", "adventure" ],
comments : [{
author : "Kyle",date : ISODate("2011-09-19T09:56:06.298Z"),text : "great book"
}],
comments_count : 1
}
Extending the Schema
-
7/31/2019 PMUG Schema Design and Scaling
26/93
// create index on nested documents:> db.posts.ensureIndex({" comments.author ": 1})
> db.posts.find({" comments.author ":"Kyle"})
// find last 5 posts:> db.posts.find().sort({ date :-1}).limit(5)
// most commented post:> db.posts.find().sort({ comments_count :-1}).limit(1)
When sorting, check if you need an index
Extending the Schema
-
7/31/2019 PMUG Schema Design and Scaling
27/93
Use MongoDB with yourlanguage10gen Supported Drivers Ruby, Python, Perl, PHP, Javascript, node.js Java, C/C++, C#, Scala
Erlang, Haskell
Object Data Mappers Morphia - Java Mongoid, MongoMapper - Ruby MongoEngine - Python
Community Drivers F# , Smalltalk, Clojure, Go, Groovy, Delphi Lua, PowerShell, R
-
7/31/2019 PMUG Schema Design and Scaling
28/93
Using your schema- example Java Driver// Get a connection to the databaseDBCollection coll = new Mongo().getDB("posts");
// Create the ObjectMap obj = new HashMap...obj.add(" author ", "Herg");obj.add(" text ", "Destination Moon");obj.add(" date ", new Date());
// Insert the object into MongoDBcoll.insert(new BasicDBObject(obj));
-
7/31/2019 PMUG Schema Design and Scaling
29/93
Using your schema- example Morphia mapper// Use Morphia annotations@Entityclass Blog { @Id
String author; @Indexed
Date date;String text;
}
-
7/31/2019 PMUG Schema Design and Scaling
30/93
Using your schema- example Morphia// Create the data storeDatastore ds = new Morphia().createDatastore()
// Create the ObjectPost entry = new Post("Herg",
New Date(),"Destination Moon")
// Insert object into MongoDBds.save(entry);
-
7/31/2019 PMUG Schema Design and Scaling
31/93
-
7/31/2019 PMUG Schema Design and Scaling
32/93
Common Patterns
http://www.ickr.com/photos/colinwarren/158628063
-
7/31/2019 PMUG Schema Design and Scaling
33/93
Inheritance
http://www.ickr.com/photos/dysonstarr/5098228295
-
7/31/2019 PMUG Schema Design and Scaling
34/93
Inheritance
-
7/31/2019 PMUG Schema Design and Scaling
35/93
shapes tableid type area radius length width
1 circle 3.14 1
2 square 4 2
3 rect 10 5 2
Single Table Inheritance -RDBMS
-
7/31/2019 PMUG Schema Design and Scaling
36/93
Single Table Inheritance -MongoDB
> db.shapes.find(){ _id : "1", type : "circle", area : 3.14, radius : 1}{ _id : "2", type : "square", area : 4, length : 2}{ _id : "3", type : "rect", area : 10, length : 5, width : 2}
missingvalues not stored!
-
7/31/2019 PMUG Schema Design and Scaling
37/93
Single Table Inheritance -MongoDB
> db.shapes.find(){ _id : "1", type : "circle", area : 3.14, radius : 1}{ _id : "2", type : "square", area : 4, length : 2}{ _id : "3", type : "rect", area : 10, length : 5, width : 2}
// find shapes where radius > 0> db.shapes.find({ radius : { $gt : 0}})
-
7/31/2019 PMUG Schema Design and Scaling
38/93
Single Table Inheritance -MongoDB
> db.shapes.find(){ _id : "1", type : "circle", area : 3.14, radius : 1}{ _id : "2", type : "square", area : 4, length : 2}{ _id : "3", type : "rect", area : 10, length : 5, width : 2}
// find shapes where radius > 0> db.shapes.find({ radius : { $gt : 0}})
// create index> db.shapes.ensureIndex({ radius : 1}, { sparse:true })
indexonly values present!
-
7/31/2019 PMUG Schema Design and Scaling
39/93
One to Many
http://www.ickr.com/photos/j-sh/6502708899/
-
7/31/2019 PMUG Schema Design and Scaling
40/93
One to Many
One to Many relationships can specify degree of association between objects containment life-cycle
-
7/31/2019 PMUG Schema Design and Scaling
41/93
-
7/31/2019 PMUG Schema Design and Scaling
42/93
-
7/31/2019 PMUG Schema Design and Scaling
43/93
Linking 1 seek to read master
1 seek to read each detail
2 roundtrip to database
Reads longer but consistent
Writes longer but consistent
Linking versus Embedding
Embedding 1 seek to load entire object
1 roundtrip to database
Read relative to object size
Write relative to object size
-
7/31/2019 PMUG Schema Design and Scaling
44/93
Many to Many
http://www.ickr.com/photos/pats0n/6013379192
-
7/31/2019 PMUG Schema Design and Scaling
45/93
Many - Many
Example: - Product can be in many categories- Category can have many products
-
7/31/2019 PMUG Schema Design and Scaling
46/93
products:{ _id : 10,
name : "Destination Moon",category_ids : [ 20, 30 ] }
categories:
{ _id : 20,name : "comic",product_ids : [ 10, 11, 12 ] }
categories:
{ _id : 21,name : "movie",product_ids : [ 10 ] }
Many - Many
-
7/31/2019 PMUG Schema Design and Scaling
47/93
products:{ _id : 10,
name : "Destination Moon",category_ids : [ 20, 30 ] }
categories:
{ _id : 20,name : "comic",product_ids : [ 10, 11, 12 ] }
categories:
{ _id : 21,name : "movie",product_ids : [ 10 ] }
//All categories for a given product> db.categories.find({ product_ids : 10})
Many - Many
-
7/31/2019 PMUG Schema Design and Scaling
48/93
-
7/31/2019 PMUG Schema Design and Scaling
49/93
products:{ _id : 10,name : "Destination Moon",category_ids : [ 20, 30 ] }
categories:{ _id : 20,
name : "comic"}
// All products for a given category> db.products.find({ category_ids : 20)})
// All categories for a given productproduct = db.products.find( _id : some_id)> db.categories.find({ _id : {$in : product.category_ids}})
Alternative
-
7/31/2019 PMUG Schema Design and Scaling
50/93
Trees
http://www.ickr.com/photos/cubagallery/5949819558
T
-
7/31/2019 PMUG Schema Design and Scaling
51/93
Trees
Hierarchical information
T
-
7/31/2019 PMUG Schema Design and Scaling
52/93
Trees
Full Tree in Document{ comments : [
{ author : Kyle, text : ...,replies : [
{ author : James, text : ...,replies : []}]}
]}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 16MB limit
A f A
-
7/31/2019 PMUG Schema Design and Scaling
53/93
Array of Ancestors
- Store all Ancestors of a node{ _id : "a" }{ _id : "b", thread : [ "a" ], replyTo : "a" }{ _id : "c", thread : [ "a", "b" ], replyTo : "b" }{ _id : "d", thread : [ "a", "b" ], replyTo : "b" }{ _id : "e", thread : [ "a" ], replyTo : "a" }{ _id : "f", thread : [ "a", "e" ], replyTo : "e" }
A B C
DE
F
A f A
-
7/31/2019 PMUG Schema Design and Scaling
54/93
Array of Ancestors
- Store all Ancestors of a node{ _id : "a" }{ _id : "b", thread : [ "a" ], replyTo : "a" }{ _id : "c", thread : [ "a", "b" ], replyTo : "b" }{ _id : "d", thread : [ "a", "b" ], replyTo : "b" }{ _id : "e", thread : [ "a" ], replyTo : "a" }{ _id : "f", thread : [ "a", "e" ], replyTo : "e" }
// find all threads where "b" is in
> db.posts.find({ thread : "b"})
// find replies to "e"> db.posts.find({ replyTo : "e"})
// find history of "f"> threads = db.posts.findOne( { _id :"f"} ).thread> db.posts.find( { _id : { $in : threads } )
A B C
DE
F
T P th
-
7/31/2019 PMUG Schema Design and Scaling
55/93
Trees as Paths
Store hierarchy as a path expression- Separate each node by a delimiter, e.g. /- Use text search for nd parts of a tree
{ comments : [{ author : "Kyle", text : "initial post",
path : "/" },{ author : "Jim", text : "jims comment",
path : "/jim" },
{ author : "Kyle", text : "Kyles reply to Jim",path : "/jim/kyle"} ] }
// Find the conversations Jim was part of> db.posts.find( {path : /^jim/})
-
7/31/2019 PMUG Schema Design and Scaling
56/93
Q
-
7/31/2019 PMUG Schema Design and Scaling
57/93
Queue
Need to maintain order and state Ensure that updates are atomic
db.jobs.save({ inprogress : false,
priority : 1,...
});
// find highest priority job and mark as in-progress
job = db.jobs.findAndModify({query : { inprogress : false},sort : { priority : -1},update : { $set : { inprogress : true,
started : new Date()}},new: true})
-
7/31/2019 PMUG Schema Design and Scaling
58/93
-
7/31/2019 PMUG Schema Design and Scaling
59/93
Don't try this
-
7/31/2019 PMUG Schema Design and Scaling
60/93
Don't try this...
Don't try this
-
7/31/2019 PMUG Schema Design and Scaling
61/93
Don t try this...
Incorrect indexing Too many indexes; wrong keys indexed Frequent queries do not use index
Large, deeply nested documents One size ts all collections One collection per user
-
7/31/2019 PMUG Schema Design and Scaling
62/93
Summary
Schema design is di f erent in MongoDB
Basic data design principals stay the same
Focus on how the application manipulates data
Rapidly evolve schema to meet your requirements
Enjoy your new freedom, use it wisely :-)
-
7/31/2019 PMUG Schema Design and Scaling
63/93
Part 3 - Scaling
-
7/31/2019 PMUG Schema Design and Scaling
64/93
Scaling
Operations/sec go up Storage needs go up
Capacity IOPs Complexity goes up
Caching
-
7/31/2019 PMUG Schema Design and Scaling
65/93
Optimization & Tuning Schema & Index Design O/S tuning
Hardware conguration
Vertical scaling
Hardware is expensive Hard to scale in cloud
How do you scale now?
$$$
throughput
-
7/31/2019 PMUG Schema Design and Scaling
66/93
Horizontal scaling - Sharding
write
read
shard1
A-Z
300 GB Data
-
7/31/2019 PMUG Schema Design and Scaling
67/93
Horizontal scaling - Sharding
write
read
shard1
A-M
shard2
N-Z
150 GB Data 150 GB Data
-
7/31/2019 PMUG Schema Design and Scaling
68/93
Horizontal scaling - Sharding
write
read
shard1
A-H
shard2
I-Q
shard3
R-Z
100 GB Data 100 GB Data 100 GB Data
-
7/31/2019 PMUG Schema Design and Scaling
69/93
write
read
shard1
A-H
I-Q
R-Z
300 GB Data
3:1 Data/Mem
96 GB Mem
Sharding for caching
-
7/31/2019 PMUG Schema Design and Scaling
70/93
write
read
shard1
A-H
shard2
I-Q
shard3
R-Z
300 GB Data
1:1 Data/Mem
96 GB Mem
Sharding for caching
-
7/31/2019 PMUG Schema Design and Scaling
71/93
-
7/31/2019 PMUG Schema Design and Scaling
72/93
Replication
write
read
A-Z
A-Z
A-Z
300 GB Data
900 GB Data
-
7/31/2019 PMUG Schema Design and Scaling
73/93
Sharding internals
Range based partitioning
-
7/31/2019 PMUG Schema Design and Scaling
74/93
Range based partitioning
MongoDBs Sharding handle the scale problem by chunking
Break up pieces of data into smaller chunks, spread acrossmany data nodes
Each data node contains many chunks If a chunk gets too large or a node overloaded, data can berebalanced
Range based partitioning
-
7/31/2019 PMUG Schema Design and Scaling
75/93
Range based partitioning
Big Data at a Glance
-
7/31/2019 PMUG Schema Design and Scaling
76/93
Big Data at a Glance
Scaling
-
7/31/2019 PMUG Schema Design and Scaling
77/93
Scaling
Scaling
-
7/31/2019 PMUG Schema Design and Scaling
78/93
Scaling
Add Nodes: Chunk Rebalancing
-
7/31/2019 PMUG Schema Design and Scaling
79/93
Add Nodes: Chunk Rebalancing
Writes Routed to Correct Chunk
-
7/31/2019 PMUG Schema Design and Scaling
80/93
Writes Routed to Correct Chunk
-
7/31/2019 PMUG Schema Design and Scaling
81/93
Chunk Splitting & Balancing
-
7/31/2019 PMUG Schema Design and Scaling
82/93
Chunk Splitting & Balancing
Chunk Splitting & Balancing
-
7/31/2019 PMUG Schema Design and Scaling
83/93
Chunk Splitting & Balancing
-
7/31/2019 PMUG Schema Design and Scaling
84/93
-
7/31/2019 PMUG Schema Design and Scaling
85/93
Chunk Splitting & Balancing
-
7/31/2019 PMUG Schema Design and Scaling
86/93
Chunk Splitting & Balancing
Chunk Splitting & Balancing
-
7/31/2019 PMUG Schema Design and Scaling
87/93
Chunk Splitting & Balancing
Chunk Splitting & Balancing
-
7/31/2019 PMUG Schema Design and Scaling
88/93
Chunk Splitting & Balancing
Reads with Key Routed Efficiently
-
7/31/2019 PMUG Schema Design and Scaling
89/93
Reads with Key Routed Efficiently
Reads with Key Routed Efficiently
-
7/31/2019 PMUG Schema Design and Scaling
90/93
Reads with Key Routed Efficiently
-
7/31/2019 PMUG Schema Design and Scaling
91/93
Summary
-
7/31/2019 PMUG Schema Design and Scaling
92/93
Summary
Scaling is simple
Add capacity before you need it
System automatically re-balances your data
No downtime to add capacity
No code changes required
-
7/31/2019 PMUG Schema Design and Scaling
93/93
@mongodb
conferences, appearances, and meetupshttp://www.10gen.com/events
http://bit.ly/mongo>Facebook | Twitter | LinkedIn
http://linkd.in/joinmongo
download at mongodb.org