sas on hadoopweb clients applications hadoop erp scm crm images audio and video machine logs text f...
TRANSCRIPT
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS ON HADOOP
HENRIK SLETTENE, SAS NORDIC
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHAT IS HADOOP? DICTIONARY DEFINITION
“Hadoop is one way of using a set of cheap
computers to store an enormous amount of data
and then to process that data in parallel."
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHAT IS HADOOP?AS A DATA PLATFORM LOWER STORAGE COSTS ARE
MUCH LOWER…
$0,00
$2 000 000,00
$4 000 000,00
$6 000 000,00
$8 000 000,00
$10 000 000,00
$12 000 000,00
$14 000 000,00
$16 000 000,00
$18 000 000,00
1 10 100 1000
Tota
l Co
st
Number of Gigabytes
Hadoop
Teradata Warehouse Appliance
Oracle Exadata
IBM Netezza
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHY SHOULD YOU
CARE?
THE TREND IS FOR INCREASING USAGE!
IT IS “THE” TECHNOLOGICAL MODERNIZATION TOPIC!
Source: SandHill Group, Do You Hadoop? A Survey of Big Data Practitioners October 29, 2013
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHY SHOULD YOU
CARE?
THE TREND IS FOR INCREASING USAGE!
IT IS “THE” TECHNOLOGICAL MODERNIZATION TOPIC!
Hadoop is muscling into
Traditional Data
Warehouse market
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
MAJOR ADOPTIONS
OF HADOOP
NOT MUTUALLY EXCLUSIVE… BUT OFTEN NOT SEEN TOGETHER!
Hadoop as a Data Platform(standalone or as part of a broader ecosystem)
Hadoop as a core component of the next
generation of BI and Analytics
.. to support innovative business usage.. to support an IT Transformation
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHERE WE ARE
TODAY?SETTING THE SCENE
Operational
Data Sources
EDW
Data Mart
Data Mart
Analytic
MartAnalytic
Mart
BI and
Analytics
Unstructured, Semi-structured and Streaming
data (i.e. sensor data) handled often outside the
Warehouse flow
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHERE DOES
HADOOP FIT?HADOOP AS A “NEW DATA” STORE
Operational
Data Sources
EDW
Data Mart
Data Mart
Analytic
MartAnalytic
Mart
BI and
Analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHERE DOES
HADOOP FIT?HADOOP AS AN ADDITIONAL INPUT TO THE EDW
Operational
Data Sources
EDW
Data Mart
Data Mart
Analytic
MartAnalytic
Mart
Analytic
Mart
Data Mart
BI and
Analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHERE DOES
HADOOP FIT?
HADOOP DATA PLATFORM AS A BASIS FOR BI AND
ANALYTICS
Operational
Data Sources
EDW
Analytic
Mart
Data Mart
Data Mart
Data Mart
Analytic
MartAnalytic
Mart
BI and
Analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
WHERE DOES
HADOOP FIT?
HADOOP DATA PLATFORM AS A “STAGING LAYER” AS
PART OF A “DATA LAKE” – Downstream stores could be
Hadoop, data appliances or an RDBMS
Data Mart
Operational
Data Sources EDW
Data Mart
Analytic
MartAnalytic
Mart
BI and
Analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA
ANALYTICS
SAS/Access to Hadoop or Impala - Push some of SAS’ processing to Hadoop
HADOOP
Hive QL
SASSERVER
SAS/Access to Hadoop
SAS/Access to Cloudera Impala
SAS/ACCESS FOR HADOOP
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS/Embedded Process - Push SAS processing to Hadoop with Map Reduce
BIG DATA
ANALYTICS
HADOOP
SAS Data Step& DS2
SASSERVER
SAS/Scoring Accelerator for Hadoop
SAS/Code Accelerator for Hadoop (Nov 2014)
SAS/Data Quality Accelerator for Hadoop (Nov 2014)
SAS/Data Loader for Hadoop (Nov 2014)
proc ds2 ;
/* thread ~ eqiv to a mapper */
thread map_program;
method run(); set dbmslib.intab;
/* program statements */
end; endthread; run;
/* program wrapper */
data hdf.data_reduced;
dcl thread map_program map_pgm; method run();
set from map_pgm threads=N;
/* reduce steps */ end; enddata;
run; quit;
SAS EMBEDDED PROCESS
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
SAS/High Performance Analytics – HP Enabled SAS Procedures
SAS / HIGH PERFORMANCE ANALYTICS
HADOOP
SAS HPA Procedures
SASSERVER
SAS High-Performance Statistics
SAS High-Performance Data Mining
SAS High-Performance Text Mining
SAS High-Performance Econometrics
SAS High-Performance Forecasting
SAS High-Performance Optimization
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA
ANALYTICSIN-MEMORY ANALYTICS
SAS®
LASR ANALYTIC SERVER
SAS®
IN-MEMORY
SAS®
IN-MEMORY
SAS®
IN-MEMORY
SAS®
IN-MEMORY
SAS®
IN-MEMORY
HADOOPWEB CLIENTS APPLICATIONSERP
SCM
CRM
Images
Audio
and Video
Machine
Logs
Text
fWeb and
Social
In-Memory Analytics – Process in Memory, use Hadoop for Storage persistence and commodity computing
SAS ANALYTIC HADOOP ENVIRONMENT
Visual Analytics
Visual Statistics
Visual Scenario Designer
In-Memory Statistics
Data Director*
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
IDENTIFY /
FORMULATE
PROBLEM
DATA
PREPARATION
DATA
EXPLORATION
TRANSFORM
& SELECT
BUILD
MODEL
VALIDATE
MODEL
DEPLOY
MODEL
EVALUATE /
MONITOR
RESULTS
BIG DATA
ANALYTICS
OUR STRATEGY IS TO ENABLE THE ENTIRE LIFECYCLE
AROUND HADOOPBASE SAS
SAS / Access
SAS Data Management
SAS DI Studio
SAS Data Loader for Hadoop*
SAS Visual Analytics
SAS Visual Statistics
SAS High Performance Analytics Offerings
SAS In-Memory Statistics for Hadoop
Done using either the Data
Preparation, Data Exploration
or Build Model Tools
SAS High Performance Analytics Offerings
SAS In-Memory Statistics for Hadoop
SAS Visual Statistics
Done using the Build Model
Tools and other checks
SAS Scoring Accelerator for Hadoop
SAS Code Accelerator for Hadoop
SAS Visual Analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved.
BIG DATA
ANALYTICSHADOOP SCENARIOS WE SEE MOVING NOW
Exploritative Analytics:
Data Discovery Platform
Next generation Analytical Platform
A need to analyze ALL data including
unstructured data
EDW cost take out / modernization:
Current EDW too expensive
Data growing at high rate
A need to store all data for a longer
period
Agility
Digital data:
A need to gain insight of digital data
Store and analyze large amounts of
digital data to understand
consumers in a new digital era.
R&D and innovation programs:
Customer business programs and
investments around R&D and
innovations
Time to market by using more data
sources for analytics
Company Confidential - For Internal Use Only
Copyright © 2014, SAS Insti tute Inc. Al l r ights reserved. sas.com
TAKK FOR MEG!