02 - schema design

Upload: sdranga123

Post on 02-Jun-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 02 - Schema Design

    1/47

    Schema Design

    Senior Solutions Architect, MongoDB

    Ranga Sarvabhouman

    @MongoDB

  • 8/10/2019 02 - Schema Design

    2/47

    All application development is

    Schema Design

  • 8/10/2019 02 - Schema Design

    3/47

    Success comes from

    Proper Data Structure

  • 8/10/2019 02 - Schema Design

    4/47

    What is a Record?

  • 8/10/2019 02 - Schema Design

    5/47

    Key Value

    One-dimensional storage

    Single value is a blob

    Query on key only

    No schema

    Value cannot be updated, only replaced

    Key Blob

  • 8/10/2019 02 - Schema Design

    6/47

    Relational

    Two-dimensional storage (tuples)

    Each field contains a single value

    Query on any field

    Very structured schema (table)

    In-place updates

    Normalization process requires many tables, joins,indexes, and poor data locality

    Primary

    Key

  • 8/10/2019 02 - Schema Design

    7/47

    Document

    N-dimensional storage Each field can contain 0, 1,

    many, or embeddedvalues

    Query on any field & level

    Flexibleschema

    Inline updates *

    Embedding related data has optimal data locality,

    requires fewer indexes, has better performance

    _id

  • 8/10/2019 02 - Schema Design

    8/47

    Core Concepts

  • 8/10/2019 02 - Schema Design

    9/47

    Traditional Schema Design

    Focus on data storage

  • 8/10/2019 02 - Schema Design

    10/47

    Document Schema Design

    Focus on data use

  • 8/10/2019 02 - Schema Design

    11/47

    Another way to think about it

    What answers do I have?

    What questions do Ihave?

  • 8/10/2019 02 - Schema Design

    12/47

    Three Building Blocks of

    Document SchemaDesign

  • 8/10/2019 02 - Schema Design

    13/47

    1Flexibility

    Choicesfor schema design

    Each record can have different fields

    Field names consistent for programming

    Common structure can be enforced by application

    Easy to evolve as needed

  • 8/10/2019 02 - Schema Design

    14/47

    2ArraysMultiple Values per Field

    Each field can be:Absent

    Set to null

    Set to a single value

    Set to an array of manyvalues

    Query for any matching value

    Can be indexedand each value in the array is in the

    index

  • 8/10/2019 02 - Schema Design

    15/47

    3 - Embedded Documents

    An acceptable value is a document

    Nested documents provide structure

    Query any field at any level

    Can be indexed

  • 8/10/2019 02 - Schema Design

    16/47

    What is an Entity?

  • 8/10/2019 02 - Schema Design

    17/47

    An Entity

    Object in your model

    Associationswith other entities

    An Entity

    Object in your model

    Associationswith other entities

    Referencing (Relational) Embedding (Document)has_one embeds_one

    belongs_to embedded_in

    has_many embeds_many

    has_and_belongs_to_ma

    nyMongoDB has both referencing and embeddingfor universal

    coverage

  • 8/10/2019 02 - Schema Design

    18/47

    Let's model somethingtogether

    How about a businesscard?

  • 8/10/2019 02 - Schema Design

    19/47

    Business Card

  • 8/10/2019 02 - Schema Design

    20/47

    Referencing

    Addresses

    {_id: ,street: ,city: ,

    state: ,zip_code: ,country:

    }

    Contacts

    {_id: ,name: ,title: ,

    company: ,phone: ,address_id:

    }

  • 8/10/2019 02 - Schema Design

    21/47

    Embedding

    Contacts

    {_id: ,name: ,title: ,company: ,

    address: {street: ,city: ,state: ,zip_code: ,country:

    },

    phone:}

  • 8/10/2019 02 - Schema Design

    22/47

    Relational Schema

    Contact

    name

    company title

    phone

    Address

    street

    city state

    zip_code

  • 8/10/2019 02 - Schema Design

    23/47

    Contact name

    company

    adress

    Street

    City State

    Zip

    title

    phone

    address

    street

    city State

    zip_code

    Document Schema

  • 8/10/2019 02 - Schema Design

    24/47

    How are they different? Why?

    Contact

    name

    company

    title phone

    Address

    street

    city

    state zip_code

    Contact name

    company

    adress

    Street

    City State

    Zip

    title

    phone

    address

    street

    city state

    zip_code

  • 8/10/2019 02 - Schema Design

    25/47

    Schema Flexibility

    {name: ,title: ,company: ,address: {

    street: ,city: ,

    state: ,zip_code:},phone:

    }

    {name: ,url: ,title: ,company: ,email: ,address: {

    street: ,city: ,state: ,zip_code:

    }phone: ,fax

    }

    http://google.com/http://google.com/http://google.com/
  • 8/10/2019 02 - Schema Design

    26/47

    Example

  • 8/10/2019 02 - Schema Design

    27/47

    Lets Look at an

    Address Book

  • 8/10/2019 02 - Schema Design

    28/47

    Address Book

    What questions do I have?

    What are my entities?

    What are my associations?

  • 8/10/2019 02 - Schema Design

    29/47

    Address Book Entity-Relationship

    Contacts name

    company

    title

    Addresses type

    street

    city

    state

    zip_code

    Phones type number

    Emails type

    address

    Thumbnail

    s mime_type data

    Portraits mime_type

    data

    Groups name

    N

    1

    N

    1

    N

    N

    N

    1

    1

    1

    11

    Twitters name

    location web

    bio1

    1

  • 8/10/2019 02 - Schema Design

    30/47

    Associating Entities

  • 8/10/2019 02 - Schema Design

    31/47

    One to One

    Contacts name

    company

    title

    Addresses type

    street

    city

    state

    zip_code

    Phones type number

    Emails type

    address

    Thumbnail

    s mime_type data

    Portraits mime_type

    data

    Groups name

    N

    1

    N

    1

    N

    N

    N

    1

    1

    1

    11

    Twitters name

    location web

    bio1

    1

  • 8/10/2019 02 - Schema Design

    32/47

    One to OneSchema Design Choices

    contact twitter_id

    twitter1 1

    contact twitter contact_id1 1

    Redundant to track relationship on both sides Both references must be updated for consistency

    May save a fetch?

    Contact twitter

    twitter 1

  • 8/10/2019 02 - Schema Design

    33/47

    One to OneGeneral Recommendation

    Full contact info all at once Contact embedstwitter

    Parent-child relationship

    contains

    No additional data duplication

    Can query or index on embedded field

    e.g., twitter.name

    Exceptional cases

    Reference portrait which has very large data

    Contact twitter

    twitter 1

  • 8/10/2019 02 - Schema Design

    34/47

    One to Many

    Contacts name

    company

    title

    Addresses type

    street

    city

    state

    zip_code

    Phones type number

    Emails type

    address

    Thumbnail

    s mime_type data

    Portraits mime_type

    data

    Groups name

    N

    1

    N

    1

    N

    N

    N

    1

    1

    1

    11

    Twitters name

    location web

    bio1

    1

  • 8/10/2019 02 - Schema Design

    35/47

    One to ManySchema Design Choices

    contact phone_ids: [ ]

    phone1 N

    contact phone contact_id1 N

    Redundant to track relationship on both sides Both references must be updated for consistency

    Not possible in relational DBs

    Save a fetch?

    Contact phones

    phone N

  • 8/10/2019 02 - Schema Design

    36/47

    One to ManyGeneral Recommendation

    Full contact info all at once Contact embedsmultiplephones

    Parent-children relationship

    contains

    No additional data duplication

    Can query or index on any field

    e.g., { phones.type: mobile }

    Exceptional cases

    Scaling: maximum document size is 16MB

    Contact phones

    phone N

  • 8/10/2019 02 - Schema Design

    37/47

    Many to Many

    Contacts name

    company

    title

    Addresses type

    street

    city

    state

    zip_code

    Phones type number

    Emails type

    address

    Thumbnail

    s mime_type data

    Portraits mime_type

    data

    Groups name

    N

    1

    N

    1

    N

    N

    N

    1

    1

    1

    11

    Twitters name

    location web

    bio1

    1

  • 8/10/2019 02 - Schema Design

    38/47

    Many to ManyTraditional Relational Association

    Join table

    Contacts name

    company

    title phone

    Groups

    name

    GroupContacts group_id

    contact_id

    Use arraysinstead

    X

  • 8/10/2019 02 - Schema Design

    39/47

    Many to ManySchema Design Choices

    group contact_ids: [ ]

    contactN N

    group contact group_ids: []

    N N

    Redundant to track

    relationship on both sides Both references must be

    updated for consistency

    Redundant to track

    relationship on both sides

    Duplicated data must beupdated for consistency

    group contacts

    contactN

    contact groups

    groupN

  • 8/10/2019 02 - Schema Design

    40/47

    Many to ManyGeneral Recommendation

    Depends on use case1. Simple address book

    Contact references groups

    2. Corporate email groups

    Group embedscontacts for performance

    Exceptional cases

    Scaling: maximum document size is 16MB

    Scaling may affect performance and working set

    group contact group_ids: []

    N N

  • 8/10/2019 02 - Schema Design

    41/47

    Contacts name

    company

    title

    addresses type street

    city

    state

    zip_code

    phones type number

    emails type

    address

    thumbnail mime_type

    data

    Portraits

    mime_type data

    Groups name

    N

    1

    N

    1

    twitter name

    location

    web

    bio

    N

    N

    N

    1

    1

    Document model - holistic and efficient representation

  • 8/10/2019 02 - Schema Design

    42/47

    Contact document example

    {

    name : Gary J. Murakami, Ph.D.,

    company : MongoDB, Inc.,

    title : Lead Engineer,

    twitter : {

    name : Gary Murakami, location : New Providence, NJ,

    web : http://www.nobell.org

    },

    portrait_id : 1,

    addresses :

    ,

    phones :

    ,

    emails :

  • 8/10/2019 02 - Schema Design

    43/47

    Working Set

    To reduce the working set, consider

    Reference bulk data, e.g., portrait

    Reference less-used data instead of embedding

    Extract into referenced child document

    Also for performance issues with large documents

  • 8/10/2019 02 - Schema Design

    44/47

    General Recommendations

  • 8/10/2019 02 - Schema Design

    45/47

    Legacy Migration

    1. Copy existing schema & some data to MongoDB

    2. Iterate schema design development

    Measure performance, find bottlenecks, and embed

    1. one to one associations first2. one to many associations next

    3. many to many associations

    3. Migrate full dataset to new schema

    New Software Application? Embed by default

  • 8/10/2019 02 - Schema Design

    46/47

    Embedding over Referencing

    Embedding is a bit like pre-joined data

    BSON (Binary JSON) document ops are easy for the

    server

    Embed (90/10 following rule of thumb)

    When the one or many objects are viewed in thecontext of their parent

    For performance

    For atomicity

    Reference When you need more scaling

    For easy consistency with many to many associations

    without duplicated data

  • 8/10/2019 02 - Schema Design

    47/47

    Its All About Your Application

    Programs+Databases = (Big) Data Applications

    Your schema is the impedance matcher

    Design choices: normalize/denormalize,

    reference/embed Melds programming with MongoDB for best of both

    Flexiblefor development and change

    ProgramsMongoDB = Great Big Data Applications