mongo db: operational big data database
DESCRIPTION
MongoDB is the leading NoSQL database due to a plenitude of reasons, open source, general purpose, document oriented database supported by a large community and educational platform. It's horizontal scalability features allows this to fit in the operational big data scenarios where the business needs point to realtime analytics and ever-increasing data sets. This talk will focus on the usage of MongoDB for big data operational purposes and why it's ideal to be used in such scenarios. Also integration with other notable big data technology out there like Hadoop and BI tools. Norberto Leite - Senior Solutions Architect, @MongoDB. Mongo DB presentation during the Pentaho & Big Data Ecosystem - Live Seminar 2013TRANSCRIPT
MongoDB: Operational Big Data
Senior Solutions Architect, MongoDB
Norberto Leite
@nleite
Agenda
• MongoDB Intro
• Big Data
• MongoDB Operation Big Data(base)
• Use Cases
• QA
Ola!
• Norberto Leite
• Solutions Architect – wingman
• Barcelona/Brussels
MongoDB
MongoDB
The leading NoSQL database
Document Database
Open-Source
General Purpose
5,000,000+ MongoDB Downloads
100,000+ Online Education Registrants
20,000+ MongoDB User Group Members
20,000+ MongoDB Days Attendees
20,000+ MongoDB Management Service (MMS) Users
Global Community
MongoDB Overview
300+ employees 600+ customers
Offices in New York, Palo Alto, Washington DC, London, Dublin, Barcelona and Sydney Over $231 million in funding
MongoDB Overview
Agile Scalable
MongoDB Vision
To provide the best database for how we build and run apps today
Build – New and complex data – Flexible – New languages – Faster development
Run – Big Data scalability – Real-time – Commodity hardware – Cloud
Operational Database Landscape
Document Data Model
Relational MongoDB { ! first_name: ‘Paul’,! surname: ‘Miller’,! city: ‘London’,! location: [45.123,47.232],! cars: [ ! { model: ‘Bentley’,! year: 1973,! value: 100000, … },! { model: ‘Rolls Royce’,! year: 1965,! value: 330000, … }! }!}!
MongoDB is full featured
Rich Queries • Find Paul’s cars • Find everybody in London with a car built
between 1970 and 1980
Geospatial • Find all of the car owners within 5km of Trafalgar Sq.
Text Search • Find all the cars described as having leather seats
Aggregation • Calculate the average value of Paul’s car collection
Map Reduce • What is the ownership pattern of colors by
geography over time? (is purple trending up in China?)
{ ! first_name: ‘Paul’,! surname: ‘Miller’,! city: ‘London’,! location: [45.123,47.232],! cars: [ ! { model: ‘Bentley’,! year: 1973,! value: 100000, … },! { model: ‘Rolls Royce’,! year: 1965,! value: 330000, … }! }!}!
MongoDB
Developers are more productive
Big Data
Best definition so far!
RDBMS Scale = Bigger Computers
“Clients can also opt to run zEC12 without a raised datacenter floor -- a first for high-end IBM mainframes.”
IBM Press Release 28 Aug, 2012
Vertical Scalability
This Was a Problem for Google
Source: http://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html
250
,000
+ M
BP
’s =
= 4.
1 m
iles 2010 Search Index Size:
100,000,000 GB
New data added per day 100,000+ GB
Databases they could use 0
And for Facebook
2010: 13,000,000 queries per second
And for Facebook
2010: 13,000,000 queries per second
TPC Top Results
TPC #1 DB: 504,161 tps
And for Facebook
2010: 13,000,000 queries per second
TPC Top Results
TPC #1 DB: 504,161 tps
Top 10 combined: 1,370,368 tps
Living in the Post-transactional Future
Order-processing systems largely “done” (RDBMS); primary focus on better search and recommendations or adapting prices on the fly (NoSQL)
Vast majority of its engineering is focused on recommending better movies (NoSQL), not processing monthly bills (RDBMS)
Easy part is processing the credit card (RDBMS). Hard part is making it location aware, so it knows where you are and what you’re buying (NoSQL)
Shift in What We’re Computing
How IT/Data Scientists Define Big Data
Source: Silicon Angle, 2012
MongoDB Operational Big Data(base)
Consideration – Online vs. Offline
• Long-running • High-Latency • Availability is lower priority
• Real-time • Low-latency • High availability
Online Offline vs.
Consideration – Online vs. Offline
Online Offline vs.
MongoDB/NoSQL Is Good for…
360° View of the Customer
Mobile & Social Apps Fraud Detection
User Data Management
Content Management &
Delivery Reference Data
Product Catalogs Machine to Machine Apps Data Hub
MongoDB and Enterprise IT Stack
EDW Hadoop
Man
agem
ent &
Mon
itorin
g Security &
Auditing
RDBMS
CRM, ERP, Collaboration, Mobile, BI
OS & Virtualization, Compute, Storage, Network
RDBMS
Applications
Infrastructure
Data Management
Online Data Offline Data
Horizontal Scalability
MongoDB Architecture
Use Cases
Leading Organizations Rely on MongoDB
Fortune 500 & Global 500
• 10 of the Top Financial Services Institutions
• 10 of the Top Electronics Companies
• 10 of the Top Media and Entertainment Companies
• 8 of the Top Retailers
• 6 of the Top Telcos
• 5 of the Top Technology Companies
• 4 of the Top Healthcare Companies
Data Hub User Data Management
Big Data Content Mgmt & Delivery Mobile & Social
MongoDB Solutions
Customer example: Online Travel
Travel
• Flights, hotels and cars • Real-time offers • User profiles, reviews • User metadata (previous
purchases, clicks, views)
• User segmentation • Offer recommendation engine • Ad serving engine • Bundling engine
Algorithms
MongoDB Connector for
Hadoop
Machine Learning
Ad-Serving
• Catalogs and products • User profiles • Clicks • Views • Transactions
• User segmentation • Recommendation engine • Prediction engine
Algorithms
MongoDB Connector for
Hadoop
Data Hub
Insurance
• Insurance policies • Demographic data • Customer web data • Call center data • Real-time churn detection
• Customer action analysis • Churn prediction
algorithms
Churn Analysis
MongoDB Connector for
Hadoop
QA ?