nosql databases and analytic use cases

Post on 11-May-2015

389 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Koverse CTO Aaron Cordova's (@aaroncordova) talk from the 2014 INFORMS conference - "The Business of Big Data"

TRANSCRIPT

NoSQL Databases and Analytic Use Cases

Aaron Cordova INFORMS

NoSQL

• Perhaps better is “Non-Relational”

• Departure from conventional relational db

• Trade traditional features for simplicity, scalability, flexibility

Types of NoSQL DBs

Columnar!!

BigTable Hbase

Accumulo Cassandra

Graph!!

Neo4j OrientDB

Key-Value !

Dynamo Riak

Voldemort BerkeleyDB

Document!!

MongoDB CouchDB

MarkLogic (XML)

Trades

Give up!!

Cross-row Transactions Relational JOINS Type Checking

SQL

Gain!!

Simplicity Scalability (distributed)

Schema Flexibility Geographic distribution

Programmatic APIs

NoSQL Distributed

Name Age Phone

Bob 43 555-1212

Jenny 32 555-1213

Sally 28 555-1214

Joe 45 555-1215

Up to Petabytes

Consistency

Name Age Phone

Bob 43 555-1212

Jenny 32 555-1213

Sally 28 555-1214

Joe 45 555-1215

Name Age Phone

Bob 43 555-1212

Jenny 32 867-5309

Sally 28 555-1214

Joe 45 555-1215

Name Age Phone

Bob 43 555-1212

Jenny 32 555-1213

Sally 28 555-1214

Joe 45 555-1215

X

Multiple Data Centers

Single Data Center

Consistency

Geographically Distributed, !

Eventually Consistent!!

Dynamo Riak

Voldemort Cassandra MongoDB CouchDB

Single Data Center, Highly Consistent!

!BigTable Hbase

Accumulo Cassandra

Neo4j OrientDB MongoDB

MarkLogic (XML)

Programmability

SQLObjects DB

Objects DB

VS

Programmability

MongoDBWeb Client Javascript

Node.js server JavascriptJSON JSON

Analytics

Analytics

Analytical DB

Operational DB

Operational DB

Operational DB

Business Activity

Business Intelligence

Updates, transactions

Denormalized, Aggregations

Analytics

OLAP

OLTP

OLTP

OLTP

Business Activity

Business Intelligence ETL

Schema knowledge

Joins happen here

Analytics

NoSQL DB

OLTP

OLTP

OLTP

Business Activity

Business Intelligence ?

NoSQL and Analytics

• Importing operational data can create a scale problem

• Combining operational data can create sparse data

• Operational schemas may change

NoSQL and Analytics

Scalability, Schema Flexibility

Full Outer Join

Cust.name Cust.age Orders.shoes Facebook.likes …

Bob 43 $50 - …

Sarah 32 $25 5/5/14 …

Sally 28 - 4/3/12 …

- - $35 11/1/13 …

- - - 9/24/12 …

Joe 45 $45 - …

… … … … …

Billions of rows

Thousands of columns

Sparse

BigTable Data Model

Row ID Column Value

R000 Cust.name Bob

R000 Cust.age 43

R000 Orders.shoes $50

R002 Cust.name Sally

R002 Cust.age 32

R002 Facebook.likes 4/3/12

… … …

MongoDB Data Model{ !! Cust.name: “Bob”,!! Cust.age: 43,!! Orders.shoes: $50!},!{!! Cust.name: “Sally”,!! Cust.age: 32,!! Facebook.likes: 4/3/12!},!…!

NoSQL Data Loading Shift

NoSQL Analytics!!

Composite, Sparse Schemas Scale out

Aggressive Indexing Data Discovery

Conventional BI!!

Data cleaning Regularization

Denormalization Star Schema

Known operational Schemas

Analytics

NoSQL DB

OLTP

OLTP

OLTP

Business Activity

Business Intelligence

Schema Discovery

Joins happen here

NoSQL Analytics Shift

Transformations!!

MapReduce Pre-computed Large answers Simple Lookups

Queries!!

SQL Computed on the fly

Small answers Roll up

Drill down

Analytics

NoSQL DB

OLTP

OLTP

OLTP

Business Activity

Business Intelligence

MapReduce

Transformations

Fast Lookups

MapReduce Analytics

Supported!!

SQL (Hive) Statistical Modeling Machine Learning

Text Analytics Feature Extraction Image Processing

Graph Analysis

MapReduce Analytic WorkflowReusable

Transforms

SearchableCollections

Combined-Data Security

Requirements!!

Physically co-located data Strong logical access control

Role-based

Questions

?

Contact Info

!! Aaron Cordova! 1-855-403-1399 www.koverse.com info@koverse.com

top related