a social blog using mongodb itec-810 final presentation lucero soria - 42403871 supervisor: dr. jian...

23
A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Upload: julianna-wiggins

Post on 24-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

A Social blog using MongoDBITEC-810 Final Presentation

Lucero Soria - 42403871Supervisor: Dr. Jian Yang

Page 2: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Agenda

• Introduction• Methodology• Outcomes• Blog implementation• MongoDB vs. Relational databases

• Conclusions

2

Page 3: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Agenda

• Introduction• Methodology• Outcomes• Blog implementation• MongoDB vs. Relational databases

• Conclusions

3

Page 4: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Problem Specification

Relational Databases Management Systems (RDBMS), such as MySQL, do not provide the flexibility and scalability needed to manage social media data

NoSQL databases, such as MongoDB, emerged to provide the features that modern applications demand such as flexibility, scalability and productivity

4

Page 5: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Project Aim

Analyse the differences between MongoDB and relational databases, especially in supporting social media data

5

Page 6: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Background Sources

MongoDB• MongoDB Online Manual • Online articles

Relational databases• MySQL 5.5 reference manual• Social Media Management Handbook by Robert Wollan• Online articles

6

Page 7: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Agenda

• Introduction• Methodology• Outcomes• Blog implementation• MongoDB vs. Relational databases

• Conclusions

7

Page 8: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Project Approach

This project is a combination of analysis and development tasks

8

Research MongoDB, social media data and relational databases

Implement a social blog using MongoDB

Based on the implementation and research: Analyse the differences between MongoDB and relational

databases

Page 9: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Methodology

Incremental methodology was used to implement the social blog• Combines waterfall model with iterations

9

Page 10: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Agenda

• Introduction• Methodology• Outcomes• Blog implementation• MongoDB vs. Relational databases

• Conclusions

10

Page 11: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

A social blog with MongoDB

11

Features implemented:•Login with facebook to create user’s profile in MongoDB•Create, edit and delete posts (text, photos or videos) •Add comments•Search by tags •Sort by blogs with more comments

Page 12: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Analysis

Based on our experience implementing the social blog, the most relevant features to manage social media data are:

•Handle irregular data•Handle large binary objects (videos, photos)• Operations • Metadata

•Manage huge volume of data•Handle geospatial queries

12

Page 13: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Relational data model• Fixed-schema • Assume well-defined structure data with a fixed number of

fields (columns) and relationships• Minimize redundancy and dependency Normalization

13

Source: http://blog.jruby.org/

Page 14: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Terminology

RDBMS MongoDB

Table Collection

Rows JSON Document

Index Index

Join Embedding & Linking

14

Page 15: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Document-oriented data model

MongoDB uses a document-oriented model using collections

Main characteristics:• Schema-less• Collections can be created on-the-fly when first referenced • Capped collections: Fixed size, older records dropped after limit

reached• Collections store documents

15

Page 16: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

MongoDB DocumentMain characteristics:• Are represented in a format called BSON (Binary JSON)• Data is de-normalized• No joins Embedding & Linking

{ author: ‘Lucero', created: Date(‘06-06-2012'), title: 'Yet another blog post', text: 'Here is the text...', tags: [ 'example', ‘lucero' ], comments: [ { author: 'jim', comment: 'I disagree' },

{ author: 'nancy', comment: 'Good post' }]}

16

Page 17: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Storing irregular data

Example: Different information in user profiles

MongoDB• Each document can have different information doc1 = {name: “Joe”, age: ”20”, interest: ”football” } doc2 = {name : “Michele”}

Relational database• Tables with all attributes • NULL value in columns where data was not provided Results: Special queries to handle NULL values Expensive 17

Page 18: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Managing large binary data

MongoDB• Divide a large file among multiples documents (GridFS)• Include metadata to large files • Search files base on its content• Retrieve only the first N bytes of a video

Relational database• Use BLOB (Binary large objects)• Inefficient manipulating rich media • BLOB cannot be searched or manipulated using standard

database command18

Page 19: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Geospatial Indexes

Queries to find the nearest N point to a current location

MongoDB• Embedded Geospatial features

Relational database• Spatial extensions• MySQL implements a subset of the SQL with Geometry Types

environment proposed by Open Geospatial Consortium (OGC)

19

Page 20: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Managing huge volume of dataMongoDB• High performance

• No joins and embedding makes reads and writes fast• Indexes including indexing of keys from embedded documents and

arrays

• Horizontal scalability• Automatic sharding (auto-partitioning of data across servers)

Relational database• Have shown poor performance on certain data-intensive

applications and delivering streaming media Case study: Foursquare

• Difficult to scale to multiple servers 20

Page 21: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Agenda

• Introduction• Methodology• Outcomes• Blog implementation• MongoDB vs. Relational databases

• Conclusions

21

Page 22: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Conclusions

Benefits that MongoDB offers over relational database:• Flexible schema• High performance• Manipulation of large object files out of the box• Embedded geospatial features

However,• MongoDB does not replace relational databases • MongoDB and relational databases can coexist

22

Page 23: A Social blog using MongoDB ITEC-810 Final Presentation Lucero Soria - 42403871 Supervisor: Dr. Jian Yang

Thank You!Q&A

23