emc at the chief data officer forum melbourne, august 2015

24
© Copyright 2015 EMC Corporation. All rights reserved. ENTERPRISE DATA LAKE TRANSFORM YOUR ANALYTICS CHARLES SEVIOR, CTO, EMERGING TECHNOLOGY DIVISION 1 © Copyright 2015 EMC Corporation. All rights reserved. CDO LEADERSHIP FORUM MELBOURNE 2015

Upload: corinium-coriniumglobal

Post on 14-Apr-2017

384 views

Category:

Data & Analytics


4 download

TRANSCRIPT

Page 1: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

ENTERPRISE DATA LAKE TRANSFORM YOUR ANALYTICS CHARLES SEVIOR, CTO, EMERGING TECHNOLOGY DIVISION

1 © Copyright 2015 EMC Corporation. All rights reserved.

CDO LEADERSHIP FORUM MELBOURNE 2015

Page 2: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

BIG DATA NEEDS BIG STORAGE

ANALYTICS TODAY IS ABOUT BIG DATA

DATA

GROWTH

74%

Percentage of unstructured data in 2015 and growing

TOP

IT INITIATIVES

Big data analytics is one of the top IT initiatives for most organisations

Source: IDC 2014 CIO Sentiment Survey, Nov 2013. N=156; Senior IT Execs in NA, AP & EMEA

Page 3: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

See More Completely

On Demand Access to All Data

Analyze More Deeply

Act More Precisely

Deeper Insights Better Business Outcomes

DATA NIRVANA

Page 4: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

• Traditional EDW cost models do not scale to “Big Data”

• Very little of the data generated is used due to the expense and complexity of storing and processing ALL data

• Challenges accessing “Atomic” level raw data

• ETL time-consuming data transformation limits agility and exploration

DRIVERS FOR BUSINESS DATA LAKES

NEW DATA SOURCES

are emerging that do not meet traditional storage paradigms

Page 5: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

• Single Unified Data Pool = “Single Source of Truth”

• Supports Multiple Access Points & Methods

• Enterprise-grade Data Governance & Protection

• Data Migration Immune – “never migrate again!”

WHAT IS A BUSINESS DATA LAKE?

See IDC’s Insight: “Enterprise Data Lake Platforms: Deep Storage for Big Data & Analytics” July 2014

Page 6: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

EVERY BIG DATA JOURNEY IS UNIQUE

Big Data 1.0 Decentralized

Datawarehousing

Silo’d approach is inefficient & complex

Lack of cross-LOB collaboration

Focused on ‘rearview’ mirror of business

Reporting What Happened

Big Data 2.0 Analytics for Mixed

Data Sets

Complimentary to EDW

Integration of new data types (unstructured,

dark, emerging)

Mainly LOB-oriented

Understand Why It Happened

Big Data 3.0 Federated

Big Data Lake

Collect and store data

Bring the analytical tools to the data

Agile service-oriented (aaS) model & architecture

Determine What Will Happen

…..Your Business Needs defines Your Big Data Journey

Page 7: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

CURRENT STATE ANALYTICS

Existing Enterprise Data Warehouse

$$$$

(Highly Summarized / Processed Data)

ERP

HR

SFDC

Traditional Data Sources

Load

New Data Sources/Formats

Machine

ETL

Backup Storage

Trash

BI / Analytical

Tools

This data doesn’t look

right – where’s the

detail?

I really need data I know we have, but

it’s not accessible

I can’t afford to

keep buying more EDW’s

at this growth!

Business Users

Page 8: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

THE BUSINESS DATA LAKE APPROACH

Analytic Sandbox

Ad Hoc Analytic Environment

Structured BI Reporting Environment

Data Preparation and Enrichment

Via Hadoop

ALL data fed into Business Data Lake

EDW ETL

Business Data Lake

Offload EDW to Hadoop

Page 9: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

Page 10: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

HADOOP ENABLES THE DATA LAKE

An ecosystem for storing and processing any data type

Large community of users and developers

Easily extended with new interfaces and tools

Not limited to single data type – can access any data

Store, process, and analyze any size data sets

Page 11: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

Wheels Some Odd Looking Cylindrical Things

Building a model from LEGO

Page 12: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

Building a model from LEGO

Page 13: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

SEEMS CHEAP, BUT REINTRODUCES OLD IT CHALLENGES

SHOULD I USE STANDARD HADOOP?

Direct-attached storage

Stand-alone Servers

Single purpose

All commodity environment

Typical Hadoop

Support at scale

Rapid deployment

“What Now” factor

Intensive Learning Curve

Typical Challenges

Page 14: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

ANALYTICS MOBILE

ANALYTICS EFFICIENCY & SCALE ISILON HADOOP SOLUTION INNOVATION

Major reduction in complexity and storage management

TCO gains >50%

Unmatched storage efficiency

In-place Analytics – Data migration not needed

Continuous availability

Native HDFS Support for multiple distributions from the same data set simultaneously

Distributed NameNode

name node

name node

name node

name node d

ata

node

Page 15: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources

EDWs SANs Search Servers LTO Libraries NAS

Slow Results

Silos Inconsistent Security

Access

THE CURRENT STATE SITUATION

Page 16: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

Faster Time to Insights

Enterprise Security

Multi-protocol Access

Shared Storage

Reporting

Mobile Analytics

Files

Archive

Web

THE EMC DATA LAKE APPROACH

Page 17: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

HDFS

SMB, NFS, HTTP, FTP,

HDFS

Scale-Out Data Lake

Shared Data Set FTP

SMB

NFS

Apache

SUPPORT MULTIPLE APPLICATIONS AND

HADOOP DISTRIBUTIONS

name node

name node

name node

name node d

ata

node

Data Feed FASP

NFS

Page 18: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

EMC SCALE-OUT DATA LAKE

ISILON

HTTP NDMP

HDFS SMB

ECS

HCFS

HTTP

OBJECT FILE

Data Lake

Page 19: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

EMC SCALE-OUT OBJECT STORAGE

ECS

HCFS

HTTP

Site 1 Site 2

Site 3 Site N

Software Defined Storage Platform Cloud Infrastructure Solution Object Oriented Workloads Hyperscale Geo Parity Efficiency

Geo-Scale Big Data Analytics Global Content Delivery Modern Application Platforms Internet of Things Object-based Archive

OBJECT

Page 20: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

General use zone

Performance zone

EMC SCALE-OUT FILE STORAGE

Software Defined Storage Platform Datacenter Consolidation

Continuous Availability Enterprise Security & Encryption Massive Performance & Scalability Quota Management & Provisioning Automated Tiering Global Cache – RAM & Flash Supports Multiple Hadoop Distributions from Shared Data

ISILON

HTTP NDMP

HDFS SMB

FILE

Windows

UNIX/LINUX

MAC

Clients

Page 21: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

Content Subscribers

US News Best Hospitals

400+PBs

Pharmaceutical Companies

8 of the Top 10

Video Surveillance Cameras

Global Retail Banks

7 of the Top 10 2B+

1M+

11 of the Top 18

Hadoop analytics

EMC ISILON MOMENTUM

21

Capabilities and Market LEADER

Page 22: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

Faster Time to Insights

Shared Storage

Consistent Security

Multi-protocol Access

“Isilon Data Lake Platform provides vastly improved Hadoop Workload Performance over a standard DAS configuration”

IDC EMC Hadoop

Performance Validation

EMC Data Lake Foundation

vs.

Direct Attached Storage (DAS)

0

500

1000

1500

2000

2500

TeraGen TeraSortTeraValidate

Seco

nd

s

Time to Complete Operation

Data Lake Foundation

DAS

http://www.emc.com/collateral/analyst-reports/isd707-ar-idc-isilon-scale-out-datalakefoundation.pdf

Page 23: EMC at the Chief Data Officer Forum Melbourne, August 2015

© Copyright 2015 EMC Corporation. All rights reserved.

Large Telecom Provider Lays Foundation for Geo-targeted and Personalized Advertising, Monetization of Data

Challenge: • Desire to leverage set-top box data to better

understand customer viewing habits, deliver targeted advertising

• Desire to deliver geo-targeted advertising based on mobile browser app visits and search strings

• 1TB Data/Day and over 500 TB total Data volume made cost of commodity servers expensive to scale

Solution: • Delivered Cost effective Hadoop solution with

enterprise capability not provided by traditional Hadoop setups

• Leveraged Hadoop to convert unstructured data to structured, to feed into Qlikview interface.

• Build foundation to further monetize data by selling it to retailers and vendors

ISILON AND PIVOTAL HD

Page 24: EMC at the Chief Data Officer Forum Melbourne, August 2015