pmug schema design and scaling

Upload: alvin-john-richards

Post on 05-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 PMUG Schema Design and Scaling

    1/93

    MongoDBSchema Design & Scaling

    Alvin Richards

    Technical Director, [email protected]

    @jonnyeight

  • 7/31/2019 PMUG Schema Design and Scaling

    2/93

    Agenda

    18:00 - 18:15 : Why MongoDB 18:15 - 18:45 : Schema Design 18:45 - 19:00 : Break 19:00 - 19:45 : Scaling 19:45 - 20:00 : Q & A 20:00 - 22:00 : After Party!

  • 7/31/2019 PMUG Schema Design and Scaling

    3/93

  • 7/31/2019 PMUG Schema Design and Scaling

    4/93

    10gen - Company Prole

    Company behind MongoDB

    aGPL license, own copyrights, engineering team

    support, consulting, training, license revenue

    Funding $73.5 million total funding Sequoia, Union Square, Flybridge, NEA

    Management team Google/DoubleClick, Oracle, Apple, NetApp

    NYC, Palo Alto, London, Dublin & Sydney

    110+ employees

  • 7/31/2019 PMUG Schema Design and Scaling

    5/93

    Todays challenges

  • 7/31/2019 PMUG Schema Design and Scaling

    6/93

    Current technology stack addssignicant complexity

    complexity

    caching

    customsharding

    verticalscaling

  • 7/31/2019 PMUG Schema Design and Scaling

    7/93

    Current technology stackreduces productivity

    productivity

    denormalize

    removejoins

    removetransactions

  • 7/31/2019 PMUG Schema Design and Scaling

    8/93

    Why we exist

  • 7/31/2019 PMUG Schema Design and Scaling

    9/93

    More than 500 customersworldwide

    Archiving Complex DataFlexible Data

    eCommerceContent and DocumentManagement, MultiMedia

    Finance Gaming Infrastructure

    Operational Datastore for Web Infrastructure

    Real-time Analytics Media Mobile

  • 7/31/2019 PMUG Schema Design and Scaling

    10/93

    NoSQL Market Leadership

  • 7/31/2019 PMUG Schema Design and Scaling

    11/93

    Part 2 - Schema Design

  • 7/31/2019 PMUG Schema Design and Scaling

    12/93

    Topics

    Schema design is easy! Data as Objects in code

    Common patterns Single table inheritance One-to-Many & Many-to-Many Buckets Trees Queues Inventory

  • 7/31/2019 PMUG Schema Design and Scaling

    13/93

    Terminology

    RDBMS MongoDB

    Table Collection

    Row(s) JSON Document

    Index Index

    Join Embedding & Linking

    Partition Shard

    Partition Key Shard Key

  • 7/31/2019 PMUG Schema Design and Scaling

    14/93

    Schema DesignRelational Database

  • 7/31/2019 PMUG Schema Design and Scaling

    15/93

    Schema DesignMongoDB

  • 7/31/2019 PMUG Schema Design and Scaling

    16/93

    Schema DesignMongoDB embedding

  • 7/31/2019 PMUG Schema Design and Scaling

    17/93

    Schema DesignMongoDB

    linking

    embedding

  • 7/31/2019 PMUG Schema Design and Scaling

    18/93

    So todays example will use...

  • 7/31/2019 PMUG Schema Design and Scaling

    19/93

    Design Session

    Design documents that simply map toyour application

    > post = { author : "Herg",date : ISODate("2011-09-18T09:56:06.298Z"),text : "Destination Moon",tags : ["comic", "adventure"]}

    > db.posts.insert(post)

  • 7/31/2019 PMUG Schema Design and Scaling

    20/93

    > db.posts.find()

    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),author : "Herg",date : ISODate("2011-09-18T09:56:06.298Z"),text : "Destination Moon",tags : [ "comic", "adventure" ]

    }

    Notes: ID must be unique, but can be anything youd like MongoDB will generate a default ID if one is notsupplied

    Find the document

  • 7/31/2019 PMUG Schema Design and Scaling

    21/93

    Secondary index for author

    // 1 means ascending, -1 means descending

    > db.posts.ensureIndex({ author : 1})

    > db.posts.find({ author : 'Herg'})

    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),date : ISODate("2011-09-18T09:56:06.298Z"),author : "Herg",

    ... }

    Add and index, nd via Index

  • 7/31/2019 PMUG Schema Design and Scaling

    22/93

    Query operators

    Conditional operators:$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..$lt, $lte, $gt, $gte, $ne...

    // find posts with any tags> db.posts.find({ tags : { $exists : true}})

    Regular expressions:// posts where author starts with h

    > db.posts.find({ author : /^h/i })

    Counting:// number of posts written by Herg> db.posts.find({ author : "Herg"}).count()

  • 7/31/2019 PMUG Schema Design and Scaling

    23/93

    Extending the schema

    http://nysi.org.uk/kids_stu f /rocket/rocket.htm

  • 7/31/2019 PMUG Schema Design and Scaling

    24/93

    Extending the Schema

    new_comment = { author : "Kyle",date : new Date(),text : "great book"}

    > db.posts.update({ text : "Destination Moon" },

    { " $push ": { comments : new_comment }," $inc ": { comments_count : 1}})

  • 7/31/2019 PMUG Schema Design and Scaling

    25/93

    > db.posts.find({_id: ObjectId("4c4ba5c0672c685e5e8aabf3") })

    { _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),author : "Herg",date : ISODate("2011-09-18T09:56:06.298Z"),text : "Destination Moon",tags : [ "comic", "adventure" ],

    comments : [{

    author : "Kyle",date : ISODate("2011-09-19T09:56:06.298Z"),text : "great book"

    }],

    comments_count : 1

    }

    Extending the Schema

  • 7/31/2019 PMUG Schema Design and Scaling

    26/93

    // create index on nested documents:> db.posts.ensureIndex({" comments.author ": 1})

    > db.posts.find({" comments.author ":"Kyle"})

    // find last 5 posts:> db.posts.find().sort({ date :-1}).limit(5)

    // most commented post:> db.posts.find().sort({ comments_count :-1}).limit(1)

    When sorting, check if you need an index

    Extending the Schema

  • 7/31/2019 PMUG Schema Design and Scaling

    27/93

    Use MongoDB with yourlanguage10gen Supported Drivers Ruby, Python, Perl, PHP, Javascript, node.js Java, C/C++, C#, Scala

    Erlang, Haskell

    Object Data Mappers Morphia - Java Mongoid, MongoMapper - Ruby MongoEngine - Python

    Community Drivers F# , Smalltalk, Clojure, Go, Groovy, Delphi Lua, PowerShell, R

  • 7/31/2019 PMUG Schema Design and Scaling

    28/93

    Using your schema- example Java Driver// Get a connection to the databaseDBCollection coll = new Mongo().getDB("posts");

    // Create the ObjectMap obj = new HashMap...obj.add(" author ", "Herg");obj.add(" text ", "Destination Moon");obj.add(" date ", new Date());

    // Insert the object into MongoDBcoll.insert(new BasicDBObject(obj));

  • 7/31/2019 PMUG Schema Design and Scaling

    29/93

    Using your schema- example Morphia mapper// Use Morphia annotations@Entityclass Blog { @Id

    String author; @Indexed

    Date date;String text;

    }

  • 7/31/2019 PMUG Schema Design and Scaling

    30/93

    Using your schema- example Morphia// Create the data storeDatastore ds = new Morphia().createDatastore()

    // Create the ObjectPost entry = new Post("Herg",

    New Date(),"Destination Moon")

    // Insert object into MongoDBds.save(entry);

  • 7/31/2019 PMUG Schema Design and Scaling

    31/93

  • 7/31/2019 PMUG Schema Design and Scaling

    32/93

    Common Patterns

    http://www.ickr.com/photos/colinwarren/158628063

  • 7/31/2019 PMUG Schema Design and Scaling

    33/93

    Inheritance

    http://www.ickr.com/photos/dysonstarr/5098228295

  • 7/31/2019 PMUG Schema Design and Scaling

    34/93

    Inheritance

  • 7/31/2019 PMUG Schema Design and Scaling

    35/93

    shapes tableid type area radius length width

    1 circle 3.14 1

    2 square 4 2

    3 rect 10 5 2

    Single Table Inheritance -RDBMS

  • 7/31/2019 PMUG Schema Design and Scaling

    36/93

    Single Table Inheritance -MongoDB

    > db.shapes.find(){ _id : "1", type : "circle", area : 3.14, radius : 1}{ _id : "2", type : "square", area : 4, length : 2}{ _id : "3", type : "rect", area : 10, length : 5, width : 2}

    missingvalues not stored!

  • 7/31/2019 PMUG Schema Design and Scaling

    37/93

    Single Table Inheritance -MongoDB

    > db.shapes.find(){ _id : "1", type : "circle", area : 3.14, radius : 1}{ _id : "2", type : "square", area : 4, length : 2}{ _id : "3", type : "rect", area : 10, length : 5, width : 2}

    // find shapes where radius > 0> db.shapes.find({ radius : { $gt : 0}})

  • 7/31/2019 PMUG Schema Design and Scaling

    38/93

    Single Table Inheritance -MongoDB

    > db.shapes.find(){ _id : "1", type : "circle", area : 3.14, radius : 1}{ _id : "2", type : "square", area : 4, length : 2}{ _id : "3", type : "rect", area : 10, length : 5, width : 2}

    // find shapes where radius > 0> db.shapes.find({ radius : { $gt : 0}})

    // create index> db.shapes.ensureIndex({ radius : 1}, { sparse:true })

    indexonly values present!

  • 7/31/2019 PMUG Schema Design and Scaling

    39/93

    One to Many

    http://www.ickr.com/photos/j-sh/6502708899/

  • 7/31/2019 PMUG Schema Design and Scaling

    40/93

    One to Many

    One to Many relationships can specify degree of association between objects containment life-cycle

  • 7/31/2019 PMUG Schema Design and Scaling

    41/93

  • 7/31/2019 PMUG Schema Design and Scaling

    42/93

  • 7/31/2019 PMUG Schema Design and Scaling

    43/93

    Linking 1 seek to read master

    1 seek to read each detail

    2 roundtrip to database

    Reads longer but consistent

    Writes longer but consistent

    Linking versus Embedding

    Embedding 1 seek to load entire object

    1 roundtrip to database

    Read relative to object size

    Write relative to object size

  • 7/31/2019 PMUG Schema Design and Scaling

    44/93

    Many to Many

    http://www.ickr.com/photos/pats0n/6013379192

  • 7/31/2019 PMUG Schema Design and Scaling

    45/93

    Many - Many

    Example: - Product can be in many categories- Category can have many products

  • 7/31/2019 PMUG Schema Design and Scaling

    46/93

    products:{ _id : 10,

    name : "Destination Moon",category_ids : [ 20, 30 ] }

    categories:

    { _id : 20,name : "comic",product_ids : [ 10, 11, 12 ] }

    categories:

    { _id : 21,name : "movie",product_ids : [ 10 ] }

    Many - Many

  • 7/31/2019 PMUG Schema Design and Scaling

    47/93

    products:{ _id : 10,

    name : "Destination Moon",category_ids : [ 20, 30 ] }

    categories:

    { _id : 20,name : "comic",product_ids : [ 10, 11, 12 ] }

    categories:

    { _id : 21,name : "movie",product_ids : [ 10 ] }

    //All categories for a given product> db.categories.find({ product_ids : 10})

    Many - Many

  • 7/31/2019 PMUG Schema Design and Scaling

    48/93

  • 7/31/2019 PMUG Schema Design and Scaling

    49/93

    products:{ _id : 10,name : "Destination Moon",category_ids : [ 20, 30 ] }

    categories:{ _id : 20,

    name : "comic"}

    // All products for a given category> db.products.find({ category_ids : 20)})

    // All categories for a given productproduct = db.products.find( _id : some_id)> db.categories.find({ _id : {$in : product.category_ids}})

    Alternative

  • 7/31/2019 PMUG Schema Design and Scaling

    50/93

    Trees

    http://www.ickr.com/photos/cubagallery/5949819558

    T

  • 7/31/2019 PMUG Schema Design and Scaling

    51/93

    Trees

    Hierarchical information

    T

  • 7/31/2019 PMUG Schema Design and Scaling

    52/93

    Trees

    Full Tree in Document{ comments : [

    { author : Kyle, text : ...,replies : [

    { author : James, text : ...,replies : []}]}

    ]}

    Pros: Single Document, Performance, Intuitive

    Cons: Hard to search, Partial Results, 16MB limit

    A f A

  • 7/31/2019 PMUG Schema Design and Scaling

    53/93

    Array of Ancestors

    - Store all Ancestors of a node{ _id : "a" }{ _id : "b", thread : [ "a" ], replyTo : "a" }{ _id : "c", thread : [ "a", "b" ], replyTo : "b" }{ _id : "d", thread : [ "a", "b" ], replyTo : "b" }{ _id : "e", thread : [ "a" ], replyTo : "a" }{ _id : "f", thread : [ "a", "e" ], replyTo : "e" }

    A B C

    DE

    F

    A f A

  • 7/31/2019 PMUG Schema Design and Scaling

    54/93

    Array of Ancestors

    - Store all Ancestors of a node{ _id : "a" }{ _id : "b", thread : [ "a" ], replyTo : "a" }{ _id : "c", thread : [ "a", "b" ], replyTo : "b" }{ _id : "d", thread : [ "a", "b" ], replyTo : "b" }{ _id : "e", thread : [ "a" ], replyTo : "a" }{ _id : "f", thread : [ "a", "e" ], replyTo : "e" }

    // find all threads where "b" is in

    > db.posts.find({ thread : "b"})

    // find replies to "e"> db.posts.find({ replyTo : "e"})

    // find history of "f"> threads = db.posts.findOne( { _id :"f"} ).thread> db.posts.find( { _id : { $in : threads } )

    A B C

    DE

    F

    T P th

  • 7/31/2019 PMUG Schema Design and Scaling

    55/93

    Trees as Paths

    Store hierarchy as a path expression- Separate each node by a delimiter, e.g. /- Use text search for nd parts of a tree

    { comments : [{ author : "Kyle", text : "initial post",

    path : "/" },{ author : "Jim", text : "jims comment",

    path : "/jim" },

    { author : "Kyle", text : "Kyles reply to Jim",path : "/jim/kyle"} ] }

    // Find the conversations Jim was part of> db.posts.find( {path : /^jim/})

  • 7/31/2019 PMUG Schema Design and Scaling

    56/93

    Q

  • 7/31/2019 PMUG Schema Design and Scaling

    57/93

    Queue

    Need to maintain order and state Ensure that updates are atomic

    db.jobs.save({ inprogress : false,

    priority : 1,...

    });

    // find highest priority job and mark as in-progress

    job = db.jobs.findAndModify({query : { inprogress : false},sort : { priority : -1},update : { $set : { inprogress : true,

    started : new Date()}},new: true})

  • 7/31/2019 PMUG Schema Design and Scaling

    58/93

  • 7/31/2019 PMUG Schema Design and Scaling

    59/93

    Don't try this

  • 7/31/2019 PMUG Schema Design and Scaling

    60/93

    Don't try this...

    Don't try this

  • 7/31/2019 PMUG Schema Design and Scaling

    61/93

    Don t try this...

    Incorrect indexing Too many indexes; wrong keys indexed Frequent queries do not use index

    Large, deeply nested documents One size ts all collections One collection per user

  • 7/31/2019 PMUG Schema Design and Scaling

    62/93

    Summary

    Schema design is di f erent in MongoDB

    Basic data design principals stay the same

    Focus on how the application manipulates data

    Rapidly evolve schema to meet your requirements

    Enjoy your new freedom, use it wisely :-)

  • 7/31/2019 PMUG Schema Design and Scaling

    63/93

    Part 3 - Scaling

  • 7/31/2019 PMUG Schema Design and Scaling

    64/93

    Scaling

    Operations/sec go up Storage needs go up

    Capacity IOPs Complexity goes up

    Caching

  • 7/31/2019 PMUG Schema Design and Scaling

    65/93

    Optimization & Tuning Schema & Index Design O/S tuning

    Hardware conguration

    Vertical scaling

    Hardware is expensive Hard to scale in cloud

    How do you scale now?

    $$$

    throughput

  • 7/31/2019 PMUG Schema Design and Scaling

    66/93

    Horizontal scaling - Sharding

    write

    read

    shard1

    A-Z

    300 GB Data

  • 7/31/2019 PMUG Schema Design and Scaling

    67/93

    Horizontal scaling - Sharding

    write

    read

    shard1

    A-M

    shard2

    N-Z

    150 GB Data 150 GB Data

  • 7/31/2019 PMUG Schema Design and Scaling

    68/93

    Horizontal scaling - Sharding

    write

    read

    shard1

    A-H

    shard2

    I-Q

    shard3

    R-Z

    100 GB Data 100 GB Data 100 GB Data

  • 7/31/2019 PMUG Schema Design and Scaling

    69/93

    write

    read

    shard1

    A-H

    I-Q

    R-Z

    300 GB Data

    3:1 Data/Mem

    96 GB Mem

    Sharding for caching

  • 7/31/2019 PMUG Schema Design and Scaling

    70/93

    write

    read

    shard1

    A-H

    shard2

    I-Q

    shard3

    R-Z

    300 GB Data

    1:1 Data/Mem

    96 GB Mem

    Sharding for caching

  • 7/31/2019 PMUG Schema Design and Scaling

    71/93

  • 7/31/2019 PMUG Schema Design and Scaling

    72/93

    Replication

    write

    read

    A-Z

    A-Z

    A-Z

    300 GB Data

    900 GB Data

  • 7/31/2019 PMUG Schema Design and Scaling

    73/93

    Sharding internals

    Range based partitioning

  • 7/31/2019 PMUG Schema Design and Scaling

    74/93

    Range based partitioning

    MongoDBs Sharding handle the scale problem by chunking

    Break up pieces of data into smaller chunks, spread acrossmany data nodes

    Each data node contains many chunks If a chunk gets too large or a node overloaded, data can berebalanced

    Range based partitioning

  • 7/31/2019 PMUG Schema Design and Scaling

    75/93

    Range based partitioning

    Big Data at a Glance

  • 7/31/2019 PMUG Schema Design and Scaling

    76/93

    Big Data at a Glance

    Scaling

  • 7/31/2019 PMUG Schema Design and Scaling

    77/93

    Scaling

    Scaling

  • 7/31/2019 PMUG Schema Design and Scaling

    78/93

    Scaling

    Add Nodes: Chunk Rebalancing

  • 7/31/2019 PMUG Schema Design and Scaling

    79/93

    Add Nodes: Chunk Rebalancing

    Writes Routed to Correct Chunk

  • 7/31/2019 PMUG Schema Design and Scaling

    80/93

    Writes Routed to Correct Chunk

  • 7/31/2019 PMUG Schema Design and Scaling

    81/93

    Chunk Splitting & Balancing

  • 7/31/2019 PMUG Schema Design and Scaling

    82/93

    Chunk Splitting & Balancing

    Chunk Splitting & Balancing

  • 7/31/2019 PMUG Schema Design and Scaling

    83/93

    Chunk Splitting & Balancing

  • 7/31/2019 PMUG Schema Design and Scaling

    84/93

  • 7/31/2019 PMUG Schema Design and Scaling

    85/93

    Chunk Splitting & Balancing

  • 7/31/2019 PMUG Schema Design and Scaling

    86/93

    Chunk Splitting & Balancing

    Chunk Splitting & Balancing

  • 7/31/2019 PMUG Schema Design and Scaling

    87/93

    Chunk Splitting & Balancing

    Chunk Splitting & Balancing

  • 7/31/2019 PMUG Schema Design and Scaling

    88/93

    Chunk Splitting & Balancing

    Reads with Key Routed Efficiently

  • 7/31/2019 PMUG Schema Design and Scaling

    89/93

    Reads with Key Routed Efficiently

    Reads with Key Routed Efficiently

  • 7/31/2019 PMUG Schema Design and Scaling

    90/93

    Reads with Key Routed Efficiently

  • 7/31/2019 PMUG Schema Design and Scaling

    91/93

    Summary

  • 7/31/2019 PMUG Schema Design and Scaling

    92/93

    Summary

    Scaling is simple

    Add capacity before you need it

    System automatically re-balances your data

    No downtime to add capacity

    No code changes required

  • 7/31/2019 PMUG Schema Design and Scaling

    93/93

    @mongodb

    conferences, appearances, and meetupshttp://www.10gen.com/events

    http://bit.ly/mongo>Facebook | Twitter | LinkedIn

    http://linkd.in/joinmongo

    download at mongodb.org

    [email protected]