Download - Mongodb hackathon 02
![Page 2: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/2.jpg)
2
Before we start
Copyright 2013, Vivek A. Ganesan, All rights reserved
oA BIG thank you to our sponsors – Big
Data Cloud
oMeeting Space
oFood + Drinks
oConsulting/Training
![Page 3: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/3.jpg)
3
Agenda
Copyright 2013, Vivek A. Ganesan, All rights reserved
oReview of Hackathon 01
oData Modeling
oIndexing
oAggregation
oMap/Reduce
![Page 4: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/4.jpg)
4
Introduction
Copyright 2013, Vivek A. Ganesan, All rights reserved
o This is a hackathon, not a classo Which means we work on stuff together
o Please consult and help your team mates
o There will be labs (that’s when we learn!)
o Talk to your team mates
o Figure out what problem you want to solve
o Think about your data sets and how to model them in
Mongo DB
![Page 5: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/5.jpg)
5
Review – MongoDB Basics
Copyright 2013, Vivek A. Ganesan, All rights reserved
o MongoDB is a document-oriented NoSQL data store
o It saves data internally as Binary JSON
o A mongo data store may hold multiple databases
o A database may have multiple collections (analog of tables)
o A collection is a container of documents
o Documents contain Key/Value pairs
o A default key of “_id” is inserted by MongoDB for all documents
o User can set the value of “_id” to anything they want
o Documents are schema-free
o No fixed structure to a collection
o A collection can have documents with different key/value pairs
![Page 6: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/6.jpg)
6
Review – Shell and Clients
Copyright 2013, Vivek A. Ganesan, All rights reserved
o A Mongo Shell is a CLI client to MongoDB
o Shell commands are Javascript functions
o You can write your own Javascript code within the shell
o You can also import Javascript modules using load()
o Mongo Shell looks for an initialization file : ~/.mongorc.js
o Setup global variables here
o To use your favorite editor within the Mongo shell :
o Set the environment variable EDITOR to your editor
o MongoDB supports clients in several programming languages :
o JS, Java, C, C++, C#, Scala, Python, Ruby, Perl and Erlang
![Page 7: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/7.jpg)
7
Review – Mongo DB Objects
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Note : Mongo Shell commands are in blue and output is in green
o Mongo uses a hierarchical naming scheme for database objects
o The current database is always in the db object
o The db command prints the name of the current db
o A collection called “mycollection” in the current database :
o db.mycollection (Note : This is a mongodb object)
o Commands are methods invoked on objects
o For e.g., to insert a document to db.mycollection collection :
o db.mycollection.insert command
o For e.g., to find documents in db.mycollection collection :
o db.mycollection.find command
![Page 8: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/8.jpg)
8
Review – Create
Copyright 2013, Vivek A. Ganesan, All rights reserved
o First exercise :
o Create a new database called “blog”
o Create a collection called “users” and a collection called “posts”
o Solution to first exercise :
o use blog;
o db; => blog
o show collections; => system.indexes
o db.createCollection(“users”); => { “ok” => 1 }
o db.createCollection(“posts”); => { “ok” => 1 }
o show collections; => posts, system.indexes, users
![Page 9: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/9.jpg)
9
Review – Insert
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Second Exercise :
o In the “users” collection :
o Insert a single document, {username: “admin”}
o In the “posts” collection :
o Insert ten posts using a loop
o Blog data : post_title, post_body and post_tags as CSV
o Solution to Second Exercise :o db.users.insert({username : “admin”});
o for (var i = 1; i <= 10; i++) { db.posts.insert({post_title: "Title",
post_body: "Post Body", post_tags: "tag1,tag2,tag3,tag4,tag5"});
}
![Page 10: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/10.jpg)
10
Review – Updates with modifier
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Third Exercise :
o In the “posts” collection :
o Update ten posts with an updated_at key and set it to the
current timestamp
o Solution to the Third Exercise :
o Note : MongoDB replaces the entire document for an
update call without a modifier (modifiers start with a
‘$’ symbol)
o db.posts.update({}, {$set : {updated_at: new Date()}},
false, true);
![Page 11: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/11.jpg)
11
Review – Selective Updates
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Fourth Exercise :
o In the “posts” collection :
o Update the posts such that the first three posts have a “foo”
tag (use the cursor functionality to iterate)
o Solution to the Fourth Exercise :
o c = db.posts.find().limit(3);
o while ( c.hasNext() ) {
o post = c.next();
o post["post_tags"] = post["post_tags"] + ",foo";
o db.posts.save(post);
o }
![Page 12: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/12.jpg)
12
Review – Mastering find
Copyright 2013, Vivek A. Ganesan, All rights reserved
o In a Mongo Shell,o Find all posts but extract only the post_title field
o db.posts.find({}, {post_title: 1, _id: 0});
o List all posts but in reverse order of created_on
o db.posts.find().sort({_id: -1});
o Do the same as above but paginate in sets of three
o db.posts.find().sort({_id: -1}).skip(3).limit(3);
o Find all posts that contain a tag called “foo”
o db.posts.find({post_tags: /foo/});
![Page 13: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/13.jpg)
13
Review – Modifiers
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Fifth Exercise :o Modify “posts” collection
o Change the post_tags field to an array instead of
a CSV list
o c = db.posts.find();
o while ( c.hasNext() ) {
o post = c.next();
o post["post_tags"] = post["post_tags"].split(",");
o db.posts.save(post);
o }
![Page 14: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/14.jpg)
14
Data Modeling
Copyright 2013, Vivek A. Ganesan, All rights reserved
o http://docs.mongodb.org/manual/core/data-modeling/
o When to reference?
o When it makes sense to i.e. many-to-many relationships
o When document size is a concern
o Some drivers may do this automatically
o When to embed?
o When it is “natural” for e.g. blog post and comments
o When there is a need for atomic operations
o When read performance is critical
![Page 15: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/15.jpg)
15
Lab 01 – Model your data set
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Break – 15 minutes
o Lab 01 – 45 minutes - With your team :
o Look at your data set and figure out how you will model it
o How would you bulk load the data?
o How would you handle errors while loading?
o Implement the schema for your data set
o Bulk load a small portion of your data set
o Verify the load and also run some sample queries
o Figure out what queries you would run frequently
![Page 16: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/16.jpg)
16
Indexes
Copyright 2013, Vivek A. Ganesan, All rights reserved
o http://docs.mongodb.org/manual/core/indexes/
o When to index?
o Improve find performance
o Improve sort performance
o Note : There is a performance impact for writes
o What to index?
o Depends on the query
o Usually, most frequently searched for fields
o Sometimes, fields in embedded documents as well
![Page 17: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/17.jpg)
17
Types of Indexes and Options
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Unique indexes (_id has an unique index by default)
o Simple
o Compound Indexes
o Prefix order is important!
o Text indexes
o Sparse Indexes
o Multi-key indexes (for arrays)
o Geospatial and Geohaystack indexes
o Indexes can be built in the background (recommended!)
o Indexes can be named explicity (definitely recommened!)
![Page 18: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/18.jpg)
18
Lab 02 – Indexes
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Lab 02 – 30 minutes - With your team :
o Look at the frequent queries from Lab 01 and :
o Which would you index and why?
o What kind of indexes are needed?
o Since this is predominantly a read use case, index away
o Would you use the sparse index? For what and how?
o Would you use the geospatial index? For what and how?
o Would you use the TTL index? For what and how?
![Page 19: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/19.jpg)
19
Aggregation
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Used for “group by”-like queries
o Aggregation Framework (introduced in 2.1)
o http://docs.mongodb.org/manual/aggregation/
o Simple count : db.posts.count();
o Using Aggregation Framework :
db.posts.aggregate([{ $group: { _id: null, count: {$sum:
1}}}]);
o Check the reference for comparison with SQL group by
o Still supports Map/Reduce (older approach and still relevant)
![Page 20: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/20.jpg)
20
Lab 03 – Aggregation
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Lab 03 – 30 minutes - With your team :
o Figure out what aggregations to run on the data set :
o For e.g., average rating per user?
o Or, average number of movies rated by all users?
o Write the queries for these aggregations and test them
o Are indexes helpful in aggregations? Why/Why not?
o Are you better off just doing these in your client code?
Why/Why not?
o When would you use pipelined aggregations?
![Page 21: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/21.jpg)
21
Map/Reduce
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Scatter/Gather framework
o db.collection.mapReduce(map_fn, red_fn, {out: output_coll})
o http://docs.mongodb.org/manual/aggregation/
o Mapper – just emits key/value pairs
o Framework – Groups and sorts mapper output => Reducer
o Reducer – Applies a function on the input => Output Coll.
o Distributed computation framework for full table scans
o http://docs.mongodb.org/manual/tutorial/map-reduce-
examples/
![Page 22: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/22.jpg)
22
Lab 04 – Map/Reduce
Copyright 2013, Vivek A. Ganesan, All rights reserved
o Lab 04 – 30 minutes - With your team :
o Go through the Map/Reduce examples
o Figure out what Map/Reduce functions you would use
o Implement these functions (on a small data set)
o Some things to think about :
o Can you use Map/Reduce to “seed” your
recommendations?
o Can you use incremental Map/Reduce to “update”
your recommendations? How would you do this?
![Page 23: Mongodb hackathon 02](https://reader035.vdocuments.site/reader035/viewer/2022062320/55b8b172bb61eba7038b4636/html5/thumbnails/23.jpg)
Copyright 2013, Vivek A. Ganesan, All rights reserved
23
Questions? Comments?
Thank You!
E-mail: [email protected] : onevivek