sas pre big data · data mart analytic mart analytic mart bi and analytics unstructured,...

44
Copyright © 2012, SAS Institute Inc. All rights reserved. SAS FOR BIG DATA PRESENTED BY: BRAD HATHAWAY

Upload: buithu

Post on 10-May-2018

235 views

Category:

Documents


2 download

TRANSCRIPT

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS FOR BIG DATA

PRESENTED BY: BRAD HATHAWAY

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS AND

BIG DATASOME KEY TAKEAWAYS FROM THE VIDEO

• Combining Big Data and Analytics

• Hadoop allows capturing unlimited amounts of diverse data – many

companies are using this to create a “Data Lake”

• Extracting value from the lake requires analytics which makes SAS a

natural complement to Hadoop

One thing the video didn’t mention:

the longer the data stays in the

Data Lake, the better your

performance and overall experience

will be.

It is critical to have as much

processing in Hadoop as possible.

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS BUSINESS

ANALYTICS

FRAMEWORK

... GIVES SAS CUSTOMERS THE POWER TO KNOW!

• Each area is a market on its own!

• SAS is ranked as a leader in pretty

much all of them!

• Our customers are now shifting their

attention to how each of these areas

interact with Hadoop!

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

AGENDA

• What is Hadoop? (a quick refresher)

• Two Hadoop Approaches

• Data Platforms with Hadoop

• BI & Analytics on Hadoop

• SAS on Hadoop – a taste of

technology

• Data Quality Accelerator on Hadoop

• Self-Service DI on Hadoop

• SAS Visual Statistics

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHAT IS HADOOP?

A QUICK REFRESHER

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHAT IS HADOOP? DICTIONARY DEFINITION

“Hadoop is one way of using a set of cheap

computers to store an enormous amount of data

and then to process that data in parallel."

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHAT IS HADOOP? MAKING HADOOP EASY AND ENTERPRISE READY…

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHAT IS HADOOP?AS A DATA PLATFORM, STORAGE COSTS ARE MUCH

LOWER…

$0,00

$2 000 000,00

$4 000 000,00

$6 000 000,00

$8 000 000,00

$10 000 000,00

$12 000 000,00

$14 000 000,00

$16 000 000,00

$18 000 000,00

1 10 100 1000

Tota

l Co

st

Number of Gigabytes

Hadoop

Teradata Warehouse Appliance

Oracle Exadata

IBM Netezza

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

PROJECTS FOR THE HADOOP STACKWHAT IS HADOOP?

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

TWO HADOOP APPROACHES

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

TWO STARTING

POINTS

NOT MUTUALLY EXCLUSIVE… BUT OFTEN NOT SEEN TOGETHER!

Hadoop as a Data Platform(standalone or as part of a broader ecosystem)

Hadoop as a core component of the next

generation of BI and Analytics

.. to support innovative business usage.. to support an IT Transformation

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DATA PLATFORMS WITH HADOOP

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHERE WE ARE

TODAY?SETTING THE SCENE

• Operational Data Sources:

• Traditional sources include ERP, CRM and

financial systems amongst others.

• Evolving sources that include unstructured

data from places like Twitter, LinkedIn etc.

and streaming data from the Internet of

Things (sensors etc.)

Operational

Data Sources

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHERE WE ARE

TODAY?SETTING THE SCENE

Operational

Data Sources

EDW

Data Mart

Data Mart

Analytic

MartAnalytic

Mart

BI and

Analytics

Unstructured, Semi-structured and

Streaming data (i.e. sensor data) often

handled outside the Warehouse flow

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHERE DOES

HADOOP FIT?HADOOP AS A “NEW DATA” STORE

Operational

Data Sources

EDW

Data Mart

Data Mart

Analytic

MartAnalytic

Mart

BI and

Analytics

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHERE DOES

HADOOP FIT?HADOOP AS AN ADDITIONAL INPUT TO THE EDW

Operational

Data Sources

EDW

Data Mart

Data Mart

Analytic

MartAnalytic

Mart

Analytic

Mart

Data Mart

BI and

Analytics

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHERE DOES

HADOOP FIT?

HADOOP DATA PLATFORM AS A BASIS FOR BI AND

ANALYTICS

Operational

Data Sources

EDW

Analytic

Mart

Data Mart

Data Mart

Data Mart

Analytic

MartAnalytic

Mart

BI and

Analytics

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHERE DOES

HADOOP FIT?

HADOOP DATA PLATFORM AS A “STAGING LAYER” AS

PART OF A “DATA LAKE” – Downstream stores could be

Hadoop, data appliances or an RDBMS

Data Mart

Operational

Data Sources EDW

Data Mart

Analytic

MartAnalytic

Mart

BI and

Analytics

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BASE SAS• Map Reduce + Pig Scripting + HDFS Commands

SAS/Access to Hadoop• Hive, Hive2 + Direct file access

SAS/Access to Impala (Cloudera only)

SAS Data Integration Studio (Transforms) in Data Management

Standard / Advanced:

SAS Federation ServerVirtual and secure access to Hadoop and more traditional sources

SAS Event Stream Processing EngineTo bring streaming data from Sensors into Hadoop

HIGH LEVEL VIEWWHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT

COMES TO USING “HADOOP AS A DATA PLATFORM”

Today... Coming very soon

• Read/Write HDFS files

• Submit HiveQL code

• Execute Map/Reduce code

• Submit Pig Latin

• Transfer data to/from Hadoop using Hadoop utilities

• SQL transforms pushed down with Access to Hadoop

engine

Everything we have today plus...

SAS Data Quality Accelerator for Hadoop

- Execute selected DQ routines in Hadoop

SAS Code Accelerator for Hadoop

- Execute SAS DS2 code in Hadoop

New Web Based Business User Interface

• Point and click data

management routines where

data stays in Hadoop

• HTML 5 Web based interface

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

BI & ANALYTICS ON HADOOP

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHEN IT COMES TO

BI / REPORTINGTWO SIMPLE THINGS TO REMEMBER

Data for data visualization, and reporting sourced from

Hadoop but the actual visualization / reporting is not

running on Hadoop

Hadoop cluster processors

used for data visualization,

exploration and reporting

SAS/Access just like we do

with an RDBMSIn-Memory

More or less business as usual Transformational

A B

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHEN IT COMES TO

BI / REPORTING

WHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT

COMES TO USING “HADOOP” AS PART OF BI

Hadoop cluster processors

used for data visualization,

exploration and reporting

Any SAS BI Product:

• SAS Visual Analytics

• SAS Office Analytics

• SAS Enterprise Guide

• SAS BI/EBI Server

• SAS Stored Processes and batch

programs for reporting

In-Memory Exploration,

Visualization & Reporting

• SAS Visual AnalyticsA B

Data for data visualization, and reporting sourced from

Hadoop but the actual visualization / reporting is not

running on Hadoop

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHEN IT COMES TO

ANALYTICSTHREE SIMPLE THINGS TO REMEMBER

Data for Analytics sourced from Hadoop but no

Analytics running on Hadoop

Hadoop cluster processors

used for Analytical Computation

Analytics deployed for

batch execution in Hadoop

Think SAS/Access just like

we do with an RDBMSThink In-Database just

like with an RDBMS

Think In-Memory

Analytics

More or less business as usual Transformational Operational

C D E

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

WHEN IT COMES TO

ANALYTICS

WHAT YOU CAN DO WITH SAS AND HADOOP WHEN IT

COMES TO USING “HADOOP” AS PART OF ANALYTICS

Data for Analytics sourced from Hadoop but no

Analytics running on Hadoop

Hadoop cluster processors

used for Analytical Computation

Analytics deployed for

batch execution on Hadoop

Any SAS Analytics Product:

• SAS Enterprise Miner

• SAS Forecast Server

• SAS/STAT etc.

In-Memory Interactive

Analytics

• SAS Visual Statistics

• SAS In-Memory

Statistics for Hadoop

Operational Analytics

• SAS Scoring

Accelerator for Hadoop

• SAS Code Accelerator

for Hadoop

C D E

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

IDENTIFY /

FORMULATE

PROBLEM

DATA

PREPARATION

DATA

EXPLORATION

TRANSFORM

& SELECT

BUILD

MODEL

VALIDATE

MODEL

DEPLOY

MODEL

EVALUATE /

MONITOR

RESULTS

THE ANALYTICS

LIFECYCLESTRATEGY: ENABLE THE ENTIRE LIFECYCLE ON HADOOP

SAS Visual Analytics

SAS Visual Statistics

SAS In-Memory Statistics for Hadoop

Done using either the Data

Preparation, Data Exploration

or Build Model Tools

SAS High Performance Analytics Offerings

supported by relevant clients like SAS

Enterprise Miner, SAS/STAT etc.

Done using the Build Model

Tools and other checks

SAS Scoring Accelerator for Hadoop

SAS Code Accelerator for Hadoop

SAS Visual Analytics

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS ON HADOOP

A TASTE OF TECHNOLOGY

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DI STUDIO FLOW INCLUDING HADOOP DATA

ORACLE

DB2

SAS

SAPAccess Hadoop Combine with other data,

Transform & Load

HADOOP

TERADATA

SAS FEDERATION SERVER

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DI STUDIO MANAGE DATA IN HADOOP STANDALONE

Creating new data in

HadoopTransform data inside

Hadoop using HiveQL

Access data in

Hadoop

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Harness the power of the Hadoop

distributed platform, big data, and SAS

data management capabilities

High performance in-database processing

Native capabilities (HiveQL, Pig, MR)

+

Value-Added capabilities

• SAS Code Accelerator

• SAS Data Quality Accelerator

Embedded into Hadoop

Hadoop Cluster

DATA MANAGEMENT

FOR HADOOPHIGH PERFORMANCE IN-HADOOP DATA PROCESSING

HDFS /

Raw Files

MapReduce, Pig,

HiveQLSAS Code

AcceleratorSAS Data Quality

Accelerator

SAS Servers

SAS LEVERAGES HADOOP FOR

MAXIMUM PERFORMANCE

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

DATA MANAGEMENT

FOR HADOOPSELF-SERVICE DATA QUERY AND TRANSFORMATION

Hadoop Cluster

New SAS Web-Based

Business User Interface

Users are able to manage big data

• Query, Select, Filter, Summarize & Transform data

• Use data quality

• Load data into SAS LASR

SAS Data Quality

Accelerator

SAS Code

Accelerator

HiveQL, Pig,

MapReduce

Feature preview: https://www.youtube.com/watch?v=6-9zcKQjCUs

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA

DIRECTOR

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA

DIRECTOR

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA

DIRECTOR

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA

DIRECTOR

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA

DIRECTOR

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA

DIRECTOR

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA

DIRECTOR

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS DATA

DIRECTOR

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

SAS®

VISUAL

STATISTICS 6.4

EXTENDING SAS VISUAL ANALYTICS FOR MORE ANALYTIC

CONTROL AND TARGETED ACTIONS

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

Advanced Modeling

Techniques

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

IN SUMMARYSAS BUSINESS ANALYTICS FRAMEWORK

... GIVES OUR CUSTOMERS THE POWER TO KNOW!

SAS does this with support of Hadoop in all core areas –

this is unique to SAS!

Any business use case

you can think of will

need all of these!

Copyr i g ht © 2012, SAS Ins t i tu t e Inc . A l l r ights reser ve d .

THANK YOU!