mongodb berlin schema design

Post on 21-Apr-2015

394 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Thinking about schema design with MongoDB? In this talk we will cover the basics and discuss common patterns such as queues, trees, inventory etc.

TRANSCRIPT

Schema DesignBasic schema modeling in MongoDB

Alvin Richards

Technical Director, EMEAalvin@10gen.com

@jonnyeight

Topics

Schema design is easy!• Data as Objects in code

Common patterns• Single table inheritance• One-to-Many & Many-to-Many• Buckets• Trees• Queues• Inventory

So today’s example will use...

Terminology

RDBMS MongoDB

Table Collection

Row(s) JSON  Document

Index Index

Join Embedding  &  Linking

Partition Shard

Partition  Key Shard  Key

Schema DesignRelational Database

Schema DesignMongoDB

Schema DesignMongoDB

embedding

Schema DesignMongoDB

embedding

linking

Design Session

Design documents that simply map to your application>  post  =  {author:  "Hergé",                    date:  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),                    text:  "Destination  Moon",                    tags:  ["comic",  "adventure"]}

>  db.posts.save(post)

>  db.posts.find()

   {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),        author:  "Hergé",          date:  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),          text:  "Destination  Moon",          tags:  [  "comic",  "adventure"  ]    }     Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied

Find the document

Secondary index for “author”

 //  1  means  ascending,  -­‐1  means  descending

 >  db.posts.ensureIndex({author:  1})

 >  db.posts.find({author:  'Hergé'})          {  _id:  ObjectId("4c4ba5c0672c685e5e8aabf3"),          date:  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),          author:  "Hergé",            ...  }

Add and index, find via Index

Examine the query plan>  db.blogs.find({author:  "Hergé"}).explain(){   "cursor"  :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   }}

Examine the query plan>  db.blogs.find({author:  "Hergé"}).explain(){   "cursor"  :  "BtreeCursor  author_1",   "nscanned"  :  1,   "nscannedObjects"  :  1,   "n"  :  1,   "millis"  :  5,   "indexBounds"  :  {     "author"  :  [       [         "Hergé",         "Hergé"       ]     ]   }}

Query operators

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Query operators

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Regular expressions://  posts  where  author  starts  with  h>  db.posts.find({author:  /^h/i  })  

Query operators

Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne...

//  find  posts  with  any  tags>  db.posts.find({tags:  {$exists:  true}})

Regular expressions://  posts  where  author  starts  with  h>  db.posts.find({author:  /^h/i  })  

Counting: //  number  of  posts  written  by  Hergé>  db.posts.find({author:  "Hergé"}).count()

Extending the Schema

       new_comment  =  {author:  "Kyle",                                  date:  new  Date(),                                text:  "great  book"}

 >  db.posts.update(                      {text:  "Destination  Moon"  },                        {  "$push":  {comments:  new_comment},                          "$inc":    {comments_count:  1}})

 >  db.blogs.find({_id:  ObjectId("4c4ba5c0672c685e5e8aabf3")})

   {  _id  :  ObjectId("4c4ba5c0672c685e5e8aabf3"),          author  :  "Hergé",        date  :  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),          text  :  "Destination  Moon",        tags  :  [  "comic",  "adventure"  ],                comments  :  [   {     author  :  "Kyle",     date  :  ISODate("2011-­‐09-­‐19T09:56:06.298Z"),     text  :  "great  book"   }        ],        comments_count:  1    }    

Extending the Schema

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({"comments.author":"Kyle"})

Extending the Schema

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({"comments.author":"Kyle"})

//  find  last  5  posts:>  db.posts.find().sort({date:-­‐1}).limit(5)

Extending the Schema

//  create  index  on  nested  documents:>  db.posts.ensureIndex({"comments.author":  1})

>  db.posts.find({"comments.author":"Kyle"})

//  find  last  5  posts:>  db.posts.find().sort({date:-­‐1}).limit(5)

//  most  commented  post:>  db.posts.find().sort({comments_count:-­‐1}).limit(1)

When sorting, check if you need an index

Extending the Schema

Use MongoDB with your language10gen Supported Drivers• Ruby, Python, Perl, PHP, Javascript• Java, C/C++, C#, Scala• Erlang, Haskell

Object Data Mappers• Morphia - Java• Mongoid, MongoMapper - Ruby• MongoEngine - Python

Community Drivers• F# , Smalltalk, Clojure, Go, Groovy

Using your schema- using Java Driver//  Get  a  connection  to  the  databaseDBCollection  coll  =  new  Mongo().getDB("blogs");

//  Create  the  ObjectMap<String,  Object>  obj  =  new  HashMap...obj.add("author",  "Hergé");  obj.add("text",  "Destination  Moon");obj.add("date",  new  Date());

//  Insert  the  object  into  MongoDBcoll.insert(new  BasicDBObject(obj));

Using your schema- using Morphia mapper//  Use  Morphia  annotations@Entityclass Blog { @Id String author; @Indexed Date date; String text;}

Using your schema- using Morphia//  Create  the  data  storeDatastore  ds  =  new  Morphia().createDatastore()

//  Create  the  ObjectBlog  entry  =  new  Blog("Hergé",  New  Date(),  "Destination  Moon")

//  Insert  object  into  MongoDBds.save(entry);

Common Patterns

Inheritance

shapes tableid type area radius length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - RDBMS

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  length:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

missing values not stored!

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  length:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  >  db.shapes.find({radius:  {$gt:  0}})

Single Table Inheritance - MongoDB>  db.shapes.find()  {  _id:  "1",  type:  "circle",area:  3.14,  radius:  1}  {  _id:  "2",  type:  "square",area:  4,  length:  2}  {  _id:  "3",  type:  "rect",    area:  10,  length:  5,  width:  2}

//  find  shapes  where  radius  >  0  >  db.shapes.find({radius:  {$gt:  0}})

//  create  index>  db.shapes.ensureIndex({radius:  1},  {sparse:true})

index only values present!

One to Many

One to Many relationships can specify• degree of association between objects• containment• life-cycle

One to Many

- Embedded Array - $slice operator to return subset of comments - some queries harder e.g find latest comments across all blogs

blogs:  {                author  :  "Hergé",        date  :  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),          comments  :  [      {     author  :  "Kyle",     date  :  ISODate("2011-­‐09-­‐19T09:56:06.298Z"),     text  :  "great  book"      }        ]}

One to Many

- Normalized (2 collections) - most flexible - more queries

blogs:  {  _id:  1000,                        author:  "Hergé",                  date:  ISODate("2011-­‐09-­‐18T09:56:06.298Z"),                    comments:  [                                    {comment  :  1)}                                      ]}

comments  :  {  _id  :  1,                          blog:  1000,                          author  :  "Kyle",            date  :  ISODate("2011-­‐09-­‐19T09:56:06.298Z")}

>  blog  =  db.blogs.find({text:  "Destination  Moon"});>  db.comments.find({blog:  blog._id});

Linking versus Embedding

• When should I embed?• When should I link?

Activity Stream - Embedded

//  users  -­‐  one  doc  per  user  with  all  tweets{    _id:      "alvin",        email:  "alvin@10gen.com",      tweets:  [    {     user:    "bob",     tweet:  "20111209-­‐1231",     text:    "Best  Tweet  Ever!"    }      ]}

Activity Stream - Linking

//  users  -­‐  one  doc  per  user    {    _id:      "alvin",                    email:  "alvin@10gen.com"    }

//  tweets  -­‐  one  doc  per  user  per  tweet    {                  user:    "bob",      tweet:  "20111209-­‐1231",      text:    "Best  Tweet  Ever!"    }    

Embedding

• Great for read performance

• One seek to load entire object

• One roundtrip to database

• Writes can be slow if adding to objects all the time

• Should you embed tweets?

Activity Stream - Buckets//  tweets  :  one  doc  per  user  per  day

     {            _id:  "alvin-­‐20111209",            email:  "alvin@10gen.com",            tweets:  [                  {  user:    "Bob",              tweet:  "20111209-­‐1231",              text:    "Best  Tweet  Ever!"  }  ,                {  author:  "Joe",              date:      "May  27  2011",              text:      "Stuck  in  traffic  (again)"  }              ]   }    

Adding a Tweet

tweet  =  {  user:    "Bob",              tweet:  "20111209-­‐1231",              text:    "Best  Tweet  Ever!"  }

db.tweets.update(  {  _id  :  "alvin-­‐20111209"  },                                      {  $push  :  {  tweets  :  tweet  }  );

Deleting a Tweet

db.tweets.update(      {  _id:  "alvin-­‐20111209"  },        {  $pull:  {  tweets:  {  tweet:  "20111209-­‐1231"    }  })

Many - Many

Example: - Product can be in many categories- Category can have many products

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    

Many - Many

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure",            product_ids:  [  10,  11,  12  ]  }

categories:      {  _id:  21,            name:  "movie",            product_ids:  [  10  ]  }

Many - Many

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure",            product_ids:  [  10,  11,  12  ]  }

categories:      {  _id:  21,            name:  "movie",            product_ids:  [  10  ]  }

//All  categories  for  a  given  product>  db.categories.find({product_ids:  10})

Many - Many

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure"}

Alternative

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure"}

//  All  products  for  a  given  category>  db.products.find({category_ids:  20)})  

Alternative

products:      {  _id:  10,          name:  "Destination  Moon",          category_ids:  [  20,  30  ]  }    categories:      {  _id:  20,            name:  "adventure"}

//  All  products  for  a  given  category>  db.products.find({category_ids:  20)})  

//  All  categories  for  a  given  productproduct    =  db.products.find(_id  :  some_id)>  db.categories.find({_id  :  {$in  :  product.category_ids}})  

Alternative

Trees

Hierarchical information

   

Trees

Full Tree in Document

{  comments:  [          {  author:  “Kyle”,  text:  “...”,                replies:  [                                            {author:  “James”,  text:  “...”,                                              replies:  []}                ]}    ]}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

   

Array of Ancestors

- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "c",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "d",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "e",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "f",  thread:  [  "a",  "e"  ],  replyTo:  "e"  }

//  find  all  threads  where  "b"  is  in

>  db.msg_tree.find({thread:  "b"})

A B C

DE

F

Array of Ancestors

- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "c",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "d",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "e",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "f",  thread:  [  "a",  "e"  ],  replyTo:  "e"  }

//  find  all  threads  where  "b"  is  in

>  db.msg_tree.find({thread:  "b"})

//  find  replies  to  "e"

>  db.msg_tree.find({replyTo:  "e"})

A B C

DE

F

Array of Ancestors

- Store all Ancestors of a node    {  _id:  "a"  }    {  _id:  "b",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "c",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "d",  thread:  [  "a",  "b"  ],  replyTo:  "b"  }    {  _id:  "e",  thread:  [  "a"  ],            replyTo:  "a"  }    {  _id:  "f",  thread:  [  "a",  "e"  ],  replyTo:  "e"  }

//  find  all  threads  where  "b"  is  in

>  db.msg_tree.find({thread:  "b"})

//  find  replies  to  "e"

>  db.msg_tree.find({replyTo:  "e"})

//  find  history  of  "f">  threads  =  db.msg_tree.findOne(  {_id:"f"}  ).thread>  db.msg_tree.find(  {  _id:  {  $in  :  threads  }  )

A B C

DE

F

Trees as Paths

Store hierarchy as a path expression- Separate each node by a delimiter, e.g. “/”- Use text search for find parts of a tree

{  comments:  [          {  author:  "Kyle",  text:  "initial  post",                path:  ""  },          {  author:  "Jim",    text:  "jim’s  comment",              path:  "jim"  },          {  author:  "Kyle",  text:  "Kyle’s  reply  to  Jim",              path  :  "jim/kyle"}  ]  }

//  Find  the  conversations  Jim  was  part  of  >  db.posts.find({path:  /^jim/})

Queue

• Need to maintain order and state• Ensure that updates are atomic

     db.jobs.save(      {  inprogress:  false,          priority:  1,        ...      });

//  find  highest  priority  job  and  mark  as  in-­‐progressjob  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1},                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}},                              new:  true})    

Queue

• Need to maintain order and state• Ensure that updates are atomic

     db.jobs.save(      {  inprogress:  false,          priority:  1,        ...      });

//  find  highest  priority  job  and  mark  as  in-­‐progressjob  =  db.jobs.findAndModify({                              query:    {inprogress:  false},                              sort:      {priority:  -­‐1},                                update:  {$set:  {inprogress:  true,                                                                started:  new  Date()}},                              new:  true})    

Queue

     {  inprogress:  true,          priority:  1,            started:  ISODate("2011-­‐09-­‐18T09:56:06.298Z")      ...      }

updated

added

Inventory

• User has a number of "votes" they can use• A finite stock that you can "sell"• A resource that can be "provisioned"

Inventory

 //  Number  of  votes  and  who  user  voted  for  {  _id:      "alvin",      votes:  42,      voted_for:  []  }

 //  Subtract  a  vote  and  add  the  blog  voted  for  db.user.update(                      {  _id:  "alvin",                            votes  :  {  $gt  :  0},                          voted_for:  {$ne:  "Destination  Moon"  },                        {  "$push":  {voted_for:  "Destination  Moon"},                          "$inc":    {votes:  -­‐1}})                                    

Summary

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the application manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)

@mongodb

conferences,  appearances,  and  meetupshttp://www.10gen.com/events

http://bit.ly/mongo>  Facebook                    |                  Twitter                  |                  LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

alvin@10gen.com

top related