making the leap to bi on hadoop by mariani, dave @ atscale

Post on 12-Jul-2015

301 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Making the leap to BI on Hadoop

Predictive Analytics & Business Insights 2014

November 19, 2014

David P. Mariani

CEO

AtScale, Inc.

2

THE TRUTH

ABOUT DATA

2

“We think only 3% of the

potentially useful data is tagged,

and even less is analyzed.”

Source: IDC Predictions 2013: Big Data, IDC

“90% of the data in the world

today has been created in

the last two years”Source: IBM

The Broken PromiseWhat We WantedCentralized Data Warehouse

What We GotData Marts

WHAT WE GOT

ETL + STAR SCHEMAS

6

INPUT DATA

ETL

MART MART MART

QUERY ENGINE

ANALYSIS TOOLS

DATA

WAREHOUSE

Traditional Data Architecture

7

INPUT DATA

ETL

MART MART MART

QUERY ENGINE

ANALYSIS TOOLS

DATA

WAREHOUSE

What’s Wrong with this Picture

Highly complex

Lots of people & skillsets

Multiple copies of data

Stale data

Rigid schema

Tough to change

Write Many StructuredEarly Transformation

8

It Takes an Army

BI Engineer

Design Reports/Dashboards

ETL Engineer

Automate Cube Load

BI Engineer

Design Cube

DBA

Automate Data Load

ETL Engineer

Write ETL Code

DBA

Create Tables

Data Warehouse Architect

Design Star Schema

SAN/NAS Engineer

Define Storage Architecture

9

Star Schema = Unnatural!

WHAT WE WANTED

SCHEMA ON DEMAND

11

Data Management Approaches

INPUT DATA

ETL

MART MART MART

QUERY ENGINE

ANALYSIS TOOLS

DATA

WAREHOUSE

Traditional Approach New Approach

INPUT DATA

ANALYSIS TOOLS

HADOOP

Time for a New Approach

VS

Write Once Semi-StructuredLate Transformation

✔ ✔ ✔

13

Not This, That

BI Engineer

Run Queries/Create Reports

Hadoop Engineer

Create EXTERNAL Tables

Hadoop Engineer

Define location to store files

BI Engineer

Design Reports/Dashboards

ETL Engineer

Automate Cube Load

BI Engineer

Design Cube

DBA

Automate Data Load

ETL Engineer

Write ETL Code

DBA

Create Tables

Data Warehouse Architect

Design Star Schema

SAN/NAS Engineer

Define Storage Architecture

VS

Example: Key-Values

Example: JSON

DEMOMOBA Game Analytics

17

Demo: DOTA 2 – What the User Sees

Key Data Points: 5 vs. 5 players per match. Players choose ‘Heroes’, use ‘Items’ & earn ‘Gold’.

FOR THE DATA SCIENTISTS!

20

As Easy As 1,2,3

BI Engineer

Run Queries/Create Reports

Hadoop Engineer

Create EXTERNAL Tables

Hadoop Engineer

Define location to store files

21

Demo: DOTA 2 – Use Case 1

Question: Who are the most popular heroes?

22

Demo: DOTA 2 – Use Case 2

Question: Which heroes have the highest win rate?

23

Demo: DOTA 2 – Use Case 3

Question: What are the top 3 items associated with the best win rate?

24

Practical Applications

Time Server Analysis (session data)

Affinity Analysis

Segmentation Analysis

Many to Many

NO JOINS = HORIZONTAL SCALE

FOR THE

ORDINARY HUMAN!

27

DEMO

29

Summary: The Do’s & Don’ts

Capture data “as is” Pre-aggregate data

Apply schema on read Force schema on load

Land new data on Hadoop Land new data on relational

DBs

Create a data warehouse Create data marts

Leverage open source engines Invest in proprietary databases

Do Don’t

Business Intelligence Redefined

top related