02 - schema design

8/10/2019 02 - Schema Design

1/47

Schema Design

Senior Solutions Architect, MongoDB

Ranga Sarvabhouman

@MongoDB


2/47

All application development is

Schema Design


3/47

Success comes from

Proper Data Structure


4/47

What is a Record?


5/47

Key Value

One-dimensional storage

Single value is a blob

Query on key only

No schema

Value cannot be updated, only replaced

Key Blob


6/47

Relational

Two-dimensional storage (tuples)

Each field contains a single value

Query on any field

Very structured schema (table)

In-place updates

Normalization process requires many tables, joins,indexes, and poor data locality

Primary

Key


7/47

Document

N-dimensional storage Each field can contain 0, 1,

many, or embeddedvalues

Query on any field & level

Flexibleschema

Inline updates *

Embedding related data has optimal data locality,

requires fewer indexes, has better performance

_id


8/47

Core Concepts


9/47

Traditional Schema Design

Focus on data storage


10/47

Document Schema Design

Focus on data use


11/47

Another way to think about it

What answers do I have?

What questions do Ihave?


12/47

Three Building Blocks of

Document SchemaDesign


13/47

1Flexibility

Choicesfor schema design

Each record can have different fields

Field names consistent for programming

Common structure can be enforced by application

Easy to evolve as needed


14/47

2ArraysMultiple Values per Field

Each field can be:Absent

Set to null

Set to a single value

Set to an array of manyvalues

Query for any matching value

Can be indexedand each value in the array is in the

index


15/47

3 - Embedded Documents

An acceptable value is a document

Nested documents provide structure

Query any field at any level

Can be indexed


16/47

What is an Entity?


17/47

An Entity

Object in your model

Associationswith other entities

An Entity

Object in your model

Associationswith other entities

Referencing (Relational) Embedding (Document)has_one embeds_one

belongs_to embedded_in

has_many embeds_many

has_and_belongs_to_ma

nyMongoDB has both referencing and embeddingfor universal

coverage


18/47

Let's model somethingtogether

How about a businesscard?


19/47

Business Card


20/47

Referencing

Addresses

{_id: ,street: ,city: ,

state: ,zip_code: ,country:

}

Contacts

{_id: ,name: ,title: ,

company: ,phone: ,address_id:

}


21/47

Embedding

Contacts

{_id: ,name: ,title: ,company: ,

address: {street: ,city: ,state: ,zip_code: ,country:

},

phone:}


22/47

Relational Schema

Contact

name

company title

phone

Address

street

city state

zip_code


23/47

Contact name

company

adress

Street

City State

Zip

title

phone

address

street

city State

zip_code

Document Schema


24/47

How are they different? Why?

Contact

name

company

title phone

Address

street

city

state zip_code

Contact name

company

adress

Street

City State

Zip

title

phone

address

street

city state

zip_code


25/47

Schema Flexibility

{name: ,title: ,company: ,address: {

street: ,city: ,

state: ,zip_code:},phone:

}

{name: ,url: ,title: ,company: ,email: ,address: {

street: ,city: ,state: ,zip_code:

}phone: ,fax

}
http://google.com/http://google.com/http://google.com/


26/47

Example


27/47

Lets Look at an

Address Book


28/47

Address Book

What questions do I have?

What are my entities?

What are my associations?


29/47

Address Book Entity-Relationship

Contacts name

company

title

Addresses type

street

city

state

zip_code

Phones type number

Emails type

address

Thumbnail

s mime_type data

Portraits mime_type

data

Groups name

N

1

N

1

N

N

N

1

1

1

11

Twitters name

location web

bio1

1


30/47

Associating Entities


31/47

One to One

Contacts name

company

title

Addresses type

street

city

state

zip_code

Phones type number

Emails type

address

Thumbnail

s mime_type data

Portraits mime_type

data

Groups name

N

1

N

1

N

N

N

1

1

1

11

Twitters name

location web

bio1

1


32/47

One to OneSchema Design Choices

contact twitter_id

twitter1 1

contact twitter contact_id1 1

Redundant to track relationship on both sides Both references must be updated for consistency

May save a fetch?

Contact twitter

twitter 1


33/47

One to OneGeneral Recommendation

Full contact info all at once Contact embedstwitter

Parent-child relationship

contains

No additional data duplication

Can query or index on embedded field

e.g., twitter.name

Exceptional cases

Reference portrait which has very large data

Contact twitter

twitter 1


34/47

One to Many

Contacts name

company

title

Addresses type

street

city

state

zip_code

Phones type number

Emails type

address

Thumbnail

s mime_type data

Portraits mime_type

data

Groups name

N

1

N

1

N

N

N

1

1

1

11

Twitters name

location web

bio1

1


35/47

One to ManySchema Design Choices

contact phone_ids: [ ]

phone1 N

contact phone contact_id1 N

Redundant to track relationship on both sides Both references must be updated for consistency

Not possible in relational DBs

Save a fetch?

Contact phones

phone N


36/47

One to ManyGeneral Recommendation

Full contact info all at once Contact embedsmultiplephones

Parent-children relationship

contains

No additional data duplication

Can query or index on any field

e.g., { phones.type: mobile }

Exceptional cases

Scaling: maximum document size is 16MB

Contact phones

phone N


37/47

Many to Many

Contacts name

company

title

Addresses type

street

city

state

zip_code

Phones type number

Emails type

address

Thumbnail

s mime_type data

Portraits mime_type

data

Groups name

N

1

N

1

N

N

N

1

1

1

11

Twitters name

location web

bio1

1


38/47

Many to ManyTraditional Relational Association

Join table

Contacts name

company

title phone

Groups

name

GroupContacts group_id

contact_id

Use arraysinstead

X


39/47

Many to ManySchema Design Choices

group contact_ids: [ ]

contactN N

group contact group_ids: []

N N

Redundant to track

relationship on both sides Both references must be

updated for consistency

Redundant to track

relationship on both sides

Duplicated data must beupdated for consistency

group contacts

contactN

contact groups

groupN


40/47

Many to ManyGeneral Recommendation

Depends on use case1. Simple address book

Contact references groups

2. Corporate email groups

Group embedscontacts for performance

Exceptional cases

Scaling: maximum document size is 16MB

Scaling may affect performance and working set

group contact group_ids: []

N N


41/47

Contacts name

company

title

addresses type street

city

state

zip_code

phones type number

emails type

address

thumbnail mime_type

data

Portraits

mime_type data

Groups name

N

1

N

1

twitter name

location

web

bio

N

N

N

1

1

Document model - holistic and efficient representation


42/47

Contact document example

{

name : Gary J. Murakami, Ph.D.,

company : MongoDB, Inc.,

title : Lead Engineer,

twitter : {

name : Gary Murakami, location : New Providence, NJ,

web : http://www.nobell.org

},

portrait_id : 1,

addresses :

,

phones :

,

emails :


43/47

Working Set

To reduce the working set, consider

Reference bulk data, e.g., portrait

Reference less-used data instead of embedding

Extract into referenced child document

Also for performance issues with large documents


44/47

General Recommendations


45/47

Legacy Migration

1. Copy existing schema & some data to MongoDB

2. Iterate schema design development

Measure performance, find bottlenecks, and embed

1. one to one associations first2. one to many associations next

3. many to many associations

3. Migrate full dataset to new schema

New Software Application? Embed by default


46/47

Embedding over Referencing

Embedding is a bit like pre-joined data

BSON (Binary JSON) document ops are easy for the

server

Embed (90/10 following rule of thumb)

When the one or many objects are viewed in thecontext of their parent

For performance

For atomicity

Reference When you need more scaling

For easy consistency with many to many associations

without duplicated data


47/47

Its All About Your Application

Programs+Databases = (Big) Data Applications

Your schema is the impedance matcher

Design choices: normalize/denormalize,

reference/embed Melds programming with MongoDB for best of both

Flexiblefor development and change

ProgramsMongoDB = Great Big Data Applications