emc at the chief data officer forum melbourne, august 2015
TRANSCRIPT
© Copyright 2015 EMC Corporation. All rights reserved.
ENTERPRISE DATA LAKE TRANSFORM YOUR ANALYTICS CHARLES SEVIOR, CTO, EMERGING TECHNOLOGY DIVISION
1 © Copyright 2015 EMC Corporation. All rights reserved.
CDO LEADERSHIP FORUM MELBOURNE 2015
© Copyright 2015 EMC Corporation. All rights reserved.
BIG DATA NEEDS BIG STORAGE
ANALYTICS TODAY IS ABOUT BIG DATA
DATA
GROWTH
74%
Percentage of unstructured data in 2015 and growing
TOP
IT INITIATIVES
Big data analytics is one of the top IT initiatives for most organisations
Source: IDC 2014 CIO Sentiment Survey, Nov 2013. N=156; Senior IT Execs in NA, AP & EMEA
© Copyright 2015 EMC Corporation. All rights reserved.
See More Completely
On Demand Access to All Data
Analyze More Deeply
Act More Precisely
Deeper Insights Better Business Outcomes
DATA NIRVANA
© Copyright 2015 EMC Corporation. All rights reserved.
• Traditional EDW cost models do not scale to “Big Data”
• Very little of the data generated is used due to the expense and complexity of storing and processing ALL data
• Challenges accessing “Atomic” level raw data
• ETL time-consuming data transformation limits agility and exploration
DRIVERS FOR BUSINESS DATA LAKES
NEW DATA SOURCES
are emerging that do not meet traditional storage paradigms
© Copyright 2015 EMC Corporation. All rights reserved.
• Single Unified Data Pool = “Single Source of Truth”
• Supports Multiple Access Points & Methods
• Enterprise-grade Data Governance & Protection
• Data Migration Immune – “never migrate again!”
WHAT IS A BUSINESS DATA LAKE?
See IDC’s Insight: “Enterprise Data Lake Platforms: Deep Storage for Big Data & Analytics” July 2014
© Copyright 2015 EMC Corporation. All rights reserved.
EVERY BIG DATA JOURNEY IS UNIQUE
Big Data 1.0 Decentralized
Datawarehousing
Silo’d approach is inefficient & complex
Lack of cross-LOB collaboration
Focused on ‘rearview’ mirror of business
Reporting What Happened
Big Data 2.0 Analytics for Mixed
Data Sets
Complimentary to EDW
Integration of new data types (unstructured,
dark, emerging)
Mainly LOB-oriented
Understand Why It Happened
Big Data 3.0 Federated
Big Data Lake
Collect and store data
Bring the analytical tools to the data
Agile service-oriented (aaS) model & architecture
Determine What Will Happen
…..Your Business Needs defines Your Big Data Journey
© Copyright 2015 EMC Corporation. All rights reserved.
CURRENT STATE ANALYTICS
Existing Enterprise Data Warehouse
$$$$
(Highly Summarized / Processed Data)
ERP
HR
SFDC
Traditional Data Sources
Load
New Data Sources/Formats
Machine
ETL
Backup Storage
Trash
BI / Analytical
Tools
This data doesn’t look
right – where’s the
detail?
I really need data I know we have, but
it’s not accessible
I can’t afford to
keep buying more EDW’s
at this growth!
Business Users
© Copyright 2015 EMC Corporation. All rights reserved.
THE BUSINESS DATA LAKE APPROACH
Analytic Sandbox
Ad Hoc Analytic Environment
Structured BI Reporting Environment
Data Preparation and Enrichment
Via Hadoop
ALL data fed into Business Data Lake
EDW ETL
Business Data Lake
Offload EDW to Hadoop
© Copyright 2015 EMC Corporation. All rights reserved.
© Copyright 2015 EMC Corporation. All rights reserved.
HADOOP ENABLES THE DATA LAKE
An ecosystem for storing and processing any data type
Large community of users and developers
Easily extended with new interfaces and tools
Not limited to single data type – can access any data
Store, process, and analyze any size data sets
© Copyright 2015 EMC Corporation. All rights reserved.
Wheels Some Odd Looking Cylindrical Things
Building a model from LEGO
© Copyright 2015 EMC Corporation. All rights reserved.
Building a model from LEGO
© Copyright 2015 EMC Corporation. All rights reserved.
SEEMS CHEAP, BUT REINTRODUCES OLD IT CHALLENGES
SHOULD I USE STANDARD HADOOP?
Direct-attached storage
Stand-alone Servers
Single purpose
All commodity environment
Typical Hadoop
Support at scale
Rapid deployment
“What Now” factor
Intensive Learning Curve
Typical Challenges
© Copyright 2015 EMC Corporation. All rights reserved.
ANALYTICS MOBILE
ANALYTICS EFFICIENCY & SCALE ISILON HADOOP SOLUTION INNOVATION
Major reduction in complexity and storage management
TCO gains >50%
Unmatched storage efficiency
In-place Analytics – Data migration not needed
Continuous availability
Native HDFS Support for multiple distributions from the same data set simultaneously
Distributed NameNode
name node
name node
name node
name node d
ata
node
© Copyright 2015 EMC Corporation. All rights reserved.
ERP, CRM, RDBMS, Machines Files, Images, Video, Logs, Clickstreams External Data Sources
EDWs SANs Search Servers LTO Libraries NAS
Slow Results
Silos Inconsistent Security
Access
THE CURRENT STATE SITUATION
© Copyright 2015 EMC Corporation. All rights reserved.
Faster Time to Insights
Enterprise Security
Multi-protocol Access
Shared Storage
Reporting
Mobile Analytics
Files
Archive
Web
THE EMC DATA LAKE APPROACH
© Copyright 2015 EMC Corporation. All rights reserved.
HDFS
SMB, NFS, HTTP, FTP,
HDFS
Scale-Out Data Lake
Shared Data Set FTP
SMB
NFS
Apache
SUPPORT MULTIPLE APPLICATIONS AND
HADOOP DISTRIBUTIONS
name node
name node
name node
name node d
ata
node
Data Feed FASP
NFS
© Copyright 2015 EMC Corporation. All rights reserved.
EMC SCALE-OUT DATA LAKE
ISILON
HTTP NDMP
HDFS SMB
ECS
HCFS
HTTP
OBJECT FILE
Data Lake
© Copyright 2015 EMC Corporation. All rights reserved.
EMC SCALE-OUT OBJECT STORAGE
ECS
HCFS
HTTP
Site 1 Site 2
Site 3 Site N
Software Defined Storage Platform Cloud Infrastructure Solution Object Oriented Workloads Hyperscale Geo Parity Efficiency
Geo-Scale Big Data Analytics Global Content Delivery Modern Application Platforms Internet of Things Object-based Archive
OBJECT
© Copyright 2015 EMC Corporation. All rights reserved.
General use zone
Performance zone
EMC SCALE-OUT FILE STORAGE
Software Defined Storage Platform Datacenter Consolidation
Continuous Availability Enterprise Security & Encryption Massive Performance & Scalability Quota Management & Provisioning Automated Tiering Global Cache – RAM & Flash Supports Multiple Hadoop Distributions from Shared Data
ISILON
HTTP NDMP
HDFS SMB
FILE
Windows
UNIX/LINUX
MAC
Clients
© Copyright 2015 EMC Corporation. All rights reserved.
Content Subscribers
US News Best Hospitals
400+PBs
Pharmaceutical Companies
8 of the Top 10
Video Surveillance Cameras
Global Retail Banks
7 of the Top 10 2B+
1M+
11 of the Top 18
Hadoop analytics
EMC ISILON MOMENTUM
21
Capabilities and Market LEADER
© Copyright 2015 EMC Corporation. All rights reserved.
Faster Time to Insights
Shared Storage
Consistent Security
Multi-protocol Access
“Isilon Data Lake Platform provides vastly improved Hadoop Workload Performance over a standard DAS configuration”
IDC EMC Hadoop
Performance Validation
EMC Data Lake Foundation
vs.
Direct Attached Storage (DAS)
0
500
1000
1500
2000
2500
TeraGen TeraSortTeraValidate
Seco
nd
s
Time to Complete Operation
Data Lake Foundation
DAS
http://www.emc.com/collateral/analyst-reports/isd707-ar-idc-isilon-scale-out-datalakefoundation.pdf
© Copyright 2015 EMC Corporation. All rights reserved.
Large Telecom Provider Lays Foundation for Geo-targeted and Personalized Advertising, Monetization of Data
Challenge: • Desire to leverage set-top box data to better
understand customer viewing habits, deliver targeted advertising
• Desire to deliver geo-targeted advertising based on mobile browser app visits and search strings
• 1TB Data/Day and over 500 TB total Data volume made cost of commodity servers expensive to scale
Solution: • Delivered Cost effective Hadoop solution with
enterprise capability not provided by traditional Hadoop setups
• Leveraged Hadoop to convert unstructured data to structured, to feed into Qlikview interface.
• Build foundation to further monetize data by selling it to retailers and vendors
ISILON AND PIVOTAL HD