02 - schema design
TRANSCRIPT
-
8/10/2019 02 - Schema Design
1/47
Schema Design
Senior Solutions Architect, MongoDB
Ranga Sarvabhouman
@MongoDB
-
8/10/2019 02 - Schema Design
2/47
All application development is
Schema Design
-
8/10/2019 02 - Schema Design
3/47
Success comes from
Proper Data Structure
-
8/10/2019 02 - Schema Design
4/47
What is a Record?
-
8/10/2019 02 - Schema Design
5/47
Key Value
One-dimensional storage
Single value is a blob
Query on key only
No schema
Value cannot be updated, only replaced
Key Blob
-
8/10/2019 02 - Schema Design
6/47
Relational
Two-dimensional storage (tuples)
Each field contains a single value
Query on any field
Very structured schema (table)
In-place updates
Normalization process requires many tables, joins,indexes, and poor data locality
Primary
Key
-
8/10/2019 02 - Schema Design
7/47
Document
N-dimensional storage Each field can contain 0, 1,
many, or embeddedvalues
Query on any field & level
Flexibleschema
Inline updates *
Embedding related data has optimal data locality,
requires fewer indexes, has better performance
_id
-
8/10/2019 02 - Schema Design
8/47
Core Concepts
-
8/10/2019 02 - Schema Design
9/47
Traditional Schema Design
Focus on data storage
-
8/10/2019 02 - Schema Design
10/47
Document Schema Design
Focus on data use
-
8/10/2019 02 - Schema Design
11/47
Another way to think about it
What answers do I have?
What questions do Ihave?
-
8/10/2019 02 - Schema Design
12/47
Three Building Blocks of
Document SchemaDesign
-
8/10/2019 02 - Schema Design
13/47
1Flexibility
Choicesfor schema design
Each record can have different fields
Field names consistent for programming
Common structure can be enforced by application
Easy to evolve as needed
-
8/10/2019 02 - Schema Design
14/47
2ArraysMultiple Values per Field
Each field can be:Absent
Set to null
Set to a single value
Set to an array of manyvalues
Query for any matching value
Can be indexedand each value in the array is in the
index
-
8/10/2019 02 - Schema Design
15/47
3 - Embedded Documents
An acceptable value is a document
Nested documents provide structure
Query any field at any level
Can be indexed
-
8/10/2019 02 - Schema Design
16/47
What is an Entity?
-
8/10/2019 02 - Schema Design
17/47
An Entity
Object in your model
Associationswith other entities
An Entity
Object in your model
Associationswith other entities
Referencing (Relational) Embedding (Document)has_one embeds_one
belongs_to embedded_in
has_many embeds_many
has_and_belongs_to_ma
nyMongoDB has both referencing and embeddingfor universal
coverage
-
8/10/2019 02 - Schema Design
18/47
Let's model somethingtogether
How about a businesscard?
-
8/10/2019 02 - Schema Design
19/47
Business Card
-
8/10/2019 02 - Schema Design
20/47
Referencing
Addresses
{_id: ,street: ,city: ,
state: ,zip_code: ,country:
}
Contacts
{_id: ,name: ,title: ,
company: ,phone: ,address_id:
}
-
8/10/2019 02 - Schema Design
21/47
Embedding
Contacts
{_id: ,name: ,title: ,company: ,
address: {street: ,city: ,state: ,zip_code: ,country:
},
phone:}
-
8/10/2019 02 - Schema Design
22/47
Relational Schema
Contact
name
company title
phone
Address
street
city state
zip_code
-
8/10/2019 02 - Schema Design
23/47
Contact name
company
adress
Street
City State
Zip
title
phone
address
street
city State
zip_code
Document Schema
-
8/10/2019 02 - Schema Design
24/47
How are they different? Why?
Contact
name
company
title phone
Address
street
city
state zip_code
Contact name
company
adress
Street
City State
Zip
title
phone
address
street
city state
zip_code
-
8/10/2019 02 - Schema Design
25/47
Schema Flexibility
{name: ,title: ,company: ,address: {
street: ,city: ,
state: ,zip_code:},phone:
}
{name: ,url: ,title: ,company: ,email: ,address: {
street: ,city: ,state: ,zip_code:
}phone: ,fax
}
http://google.com/http://google.com/http://google.com/ -
8/10/2019 02 - Schema Design
26/47
Example
-
8/10/2019 02 - Schema Design
27/47
Lets Look at an
Address Book
-
8/10/2019 02 - Schema Design
28/47
Address Book
What questions do I have?
What are my entities?
What are my associations?
-
8/10/2019 02 - Schema Design
29/47
Address Book Entity-Relationship
Contacts name
company
title
Addresses type
street
city
state
zip_code
Phones type number
Emails type
address
Thumbnail
s mime_type data
Portraits mime_type
data
Groups name
N
1
N
1
N
N
N
1
1
1
11
Twitters name
location web
bio1
1
-
8/10/2019 02 - Schema Design
30/47
Associating Entities
-
8/10/2019 02 - Schema Design
31/47
One to One
Contacts name
company
title
Addresses type
street
city
state
zip_code
Phones type number
Emails type
address
Thumbnail
s mime_type data
Portraits mime_type
data
Groups name
N
1
N
1
N
N
N
1
1
1
11
Twitters name
location web
bio1
1
-
8/10/2019 02 - Schema Design
32/47
One to OneSchema Design Choices
contact twitter_id
twitter1 1
contact twitter contact_id1 1
Redundant to track relationship on both sides Both references must be updated for consistency
May save a fetch?
Contact twitter
twitter 1
-
8/10/2019 02 - Schema Design
33/47
One to OneGeneral Recommendation
Full contact info all at once Contact embedstwitter
Parent-child relationship
contains
No additional data duplication
Can query or index on embedded field
e.g., twitter.name
Exceptional cases
Reference portrait which has very large data
Contact twitter
twitter 1
-
8/10/2019 02 - Schema Design
34/47
One to Many
Contacts name
company
title
Addresses type
street
city
state
zip_code
Phones type number
Emails type
address
Thumbnail
s mime_type data
Portraits mime_type
data
Groups name
N
1
N
1
N
N
N
1
1
1
11
Twitters name
location web
bio1
1
-
8/10/2019 02 - Schema Design
35/47
One to ManySchema Design Choices
contact phone_ids: [ ]
phone1 N
contact phone contact_id1 N
Redundant to track relationship on both sides Both references must be updated for consistency
Not possible in relational DBs
Save a fetch?
Contact phones
phone N
-
8/10/2019 02 - Schema Design
36/47
One to ManyGeneral Recommendation
Full contact info all at once Contact embedsmultiplephones
Parent-children relationship
contains
No additional data duplication
Can query or index on any field
e.g., { phones.type: mobile }
Exceptional cases
Scaling: maximum document size is 16MB
Contact phones
phone N
-
8/10/2019 02 - Schema Design
37/47
Many to Many
Contacts name
company
title
Addresses type
street
city
state
zip_code
Phones type number
Emails type
address
Thumbnail
s mime_type data
Portraits mime_type
data
Groups name
N
1
N
1
N
N
N
1
1
1
11
Twitters name
location web
bio1
1
-
8/10/2019 02 - Schema Design
38/47
Many to ManyTraditional Relational Association
Join table
Contacts name
company
title phone
Groups
name
GroupContacts group_id
contact_id
Use arraysinstead
X
-
8/10/2019 02 - Schema Design
39/47
Many to ManySchema Design Choices
group contact_ids: [ ]
contactN N
group contact group_ids: []
N N
Redundant to track
relationship on both sides Both references must be
updated for consistency
Redundant to track
relationship on both sides
Duplicated data must beupdated for consistency
group contacts
contactN
contact groups
groupN
-
8/10/2019 02 - Schema Design
40/47
Many to ManyGeneral Recommendation
Depends on use case1. Simple address book
Contact references groups
2. Corporate email groups
Group embedscontacts for performance
Exceptional cases
Scaling: maximum document size is 16MB
Scaling may affect performance and working set
group contact group_ids: []
N N
-
8/10/2019 02 - Schema Design
41/47
Contacts name
company
title
addresses type street
city
state
zip_code
phones type number
emails type
address
thumbnail mime_type
data
Portraits
mime_type data
Groups name
N
1
N
1
twitter name
location
web
bio
N
N
N
1
1
Document model - holistic and efficient representation
-
8/10/2019 02 - Schema Design
42/47
Contact document example
{
name : Gary J. Murakami, Ph.D.,
company : MongoDB, Inc.,
title : Lead Engineer,
twitter : {
name : Gary Murakami, location : New Providence, NJ,
web : http://www.nobell.org
},
portrait_id : 1,
addresses :
,
phones :
,
emails :
-
8/10/2019 02 - Schema Design
43/47
Working Set
To reduce the working set, consider
Reference bulk data, e.g., portrait
Reference less-used data instead of embedding
Extract into referenced child document
Also for performance issues with large documents
-
8/10/2019 02 - Schema Design
44/47
General Recommendations
-
8/10/2019 02 - Schema Design
45/47
Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterate schema design development
Measure performance, find bottlenecks, and embed
1. one to one associations first2. one to many associations next
3. many to many associations
3. Migrate full dataset to new schema
New Software Application? Embed by default
-
8/10/2019 02 - Schema Design
46/47
Embedding over Referencing
Embedding is a bit like pre-joined data
BSON (Binary JSON) document ops are easy for the
server
Embed (90/10 following rule of thumb)
When the one or many objects are viewed in thecontext of their parent
For performance
For atomicity
Reference When you need more scaling
For easy consistency with many to many associations
without duplicated data
-
8/10/2019 02 - Schema Design
47/47
Its All About Your Application
Programs+Databases = (Big) Data Applications
Your schema is the impedance matcher
Design choices: normalize/denormalize,
reference/embed Melds programming with MongoDB for best of both
Flexiblefor development and change
ProgramsMongoDB = Great Big Data Applications