delivering big data workloads as a service to ... - dell emc · delivering big data workloads as a...
Post on 25-Jun-2018
216 Views
Preview:
TRANSCRIPT
1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Delivering Big Data Workloads as a Service to your OrganisationCharles Sevior, CTO | Ryan Tassotti, NAS SE
2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Why Hadoop?
Oil Exploration Medical Imaging
Video SurveillanceMobile Sensors
Smart Grids
Social MediaInternet of Things
Dark Data
Fast and Cheap Way For Exploiting Massive Amounts of New Data Sources
3© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Unstructured Data Growth
Total Capacity Shipped, Worldwide Unstructured Data
80%74%
67%
71 EB 133 EB37 EB2013 2015 2017
Source: IDC
4© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Why a Data Lake?
• Eliminate inefficient islands of storage
• Simplify management and reduce costs
• Enable better information sharing
• Increase data protection and security
• Accelerate data analytics to gain new insight
• Support data-driven decision making
Bring Compute to Data – Efficiently!
5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
NASNAS
SANSAN CLOUDCLOUD
TAPETAPE
DASDAS
OBJECTOBJECT
5© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
NEXT-GEN WORKLOADS(3rd Platform)
TRADITIONAL WORKLOADS(2nd Platform)
HPC/EDW
Backup/Archive
Analytics
Mobile
File Shares
Cloud Apps
6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
TAPETAPE
NASNAS DASDAS
CLOUDCLOUDSANSAN
OBJECTOBJECT
Isilon Scale-Out Data Lake
6© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
NEXT-GEN WORKLOADS(3rd Platform)
TRADITIONAL WORKLOADS(2nd Platform)
Backup/Archive
Analytics
Mobile
File Shares
Cloud Apps
HPC/EDW
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Next-Gen Access Methods
FILEFILEFILE
FILE
7© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Backup/Archive
Analytics
Mobile
File Shares
Cloud Apps
HPC/EDW
8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Enterprise-Grade Features
IOPS MBPS
$/GB
DATA PROTECTION
DATA SECURITY PERFORMANCE MANAGEMENT
DATA MANAGEMENTFILE
FILE
FILE
FILE
FILE
FILE
FILE
FILE
9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Isilon OneFS: Scale-Out Architecture
Single Volume/ File System
Unmatched Efficiency
Simplicity &Ease of Use
LinearScalability
EasyGrowth
HighPerformance
9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Isilon Scale-Out NAS ArchitectureClients and Applications
RESTful APIGET PUT POST DELETE
Gig-e10 Gig-eNetwork
Storage NodesIsilon OneFS
Multi-ProtocolFile & Object
Client/Application Layer
Ethernet Front-End
Protocols
SMBNFS
FTPHTTP
HDFSfor
Hadoop
RESTfor
Object
InfinibandBack-End
11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
IOPSPerformance
Throughput Performance
S210 X410
© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. 11
SMB Multichannel
HDFS 2.3*
OpenStack SWIFT** Available by EOY
Flash as Cache
Accelerated Performance
Up to 1PB Globally Coherent Cache
VCE Converged Infrastructure
Hadoop Big Data Analytics
Platforms
SmartFlash
Access Methods
Solutions
Isilon Product Releases
12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
LOB1Data
LOB2Data
LOB3Data
I. LOB Siloed Data Sets
II. Removing Silos of Data
III. Early-Stage Predictive Analytics
Data Lake (Hadoop)
LOB1 LOB2 LOB3 LOBn
Data Lake (Hadoop)
DS1 DS2 DS3 DSn
Big Data Analytics
IV.Predictive Enterprise
Data Lake (Hadoop)
Big Data Analytics
DD
A1
DD
A2
DD
A3
DD
An
V. Business Analytics as a Service
DD
A1
DD
A2
DD
A3
DD
An
Hybrid Cloud
Data Lake (Hadoop)
Big Data Analytics
The Third Platform Journey
13EMC CONFIDENTIAL—INTERNAL USE ONLYEMC CONFIDENTIAL—INTERNAL USE ONLY
Challenge
Large Telco Reduces Response Time for Regulatory Reports from 1 Week to 1 Day
• Distributed and Heterogeneous data infrastructure made it difficult to respond to regulatory report requests
• Data volumes prevented analysis of information across broad timescales
ISILON AND PIVOTAL HD
Solution• Reports which required more than one week to create
can now be turned around same day• 1PB System storage capacity allows the analysis of all
data • Platform combining PHD and Isilon allows the ability
to scale infrastructure for storage without having to worry about compute
15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Overview
Hadoopis an open-source framework from Apache that allows for parallel batch processing of very large data setsMapReduceis the Hadoop process that divides the workload so multiple devices can process itHDFSis the file system for the data. It provides data protection and locality with multiple mirrors (usually 3 times)
16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop: Concerns for the Enterprise
I want to use my existing
infrastructure, not buy new
hardware
I want to leverage the
tools I already have
I want a low-risk way of trying
Hadoop
My data is in shared storage;
do I have to move it?
17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
VMware Big Data Extensions
Rapid Deployment
Self service tools Performance
True multi-tenancy Elastic scaling Avoid dedicated
hardware VM-based isolation Increase resource
utilisation
Deployment choice Maintain
management flexibility at scale
Control Costs Leverage toolsets Security
Operational Simplicity
Maximise Resource Utilisation
Architect Scalable Platform
18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
BDE: Deploy Hadoop Clusters in Minutes
From a manual process … To fully automated, using the GUI
19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Elastic, Multi-Tenant Virtualised Hadoop
Storage
ComputeCombined Compute andStorage Storage
Tena
nt 1
Tena
nt 2VM VM VM
VMVM
VM
Unmodified Hadoopnode in a VM VM lifecycle
determinedby Datanode
Limited elasticity
Separate Compute from Storage Separate compute
from data Stateless compute Elastic compute
Separate Virtual Compute Clustersper tenant Separate virtual compute Compute cluster per tenant Stronger VM‐grade security
and resource isolation
Hadoop Node
20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Why Shared Storage For Hadoop?
21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop Bare Metals DeploymentHadoop DAS Environment
1 Dedicated Storage Infrastructure– One-off for Hadoop only
2 Lacking Enterprise Data Protection– No Snapshots, replication, backup
3 Poor Storage Efficiency– 3X mirroring
4 Fixed Scalability– Rigid compute to storage ratio
5 Manual Import/Export– No protocol support
1x
1x
2x
2x
3x
2x
3x
3x
1x
NameNode
22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Hadoop On EMC Isilon Scale Out NAS
1 Scale-Out Storage Platform– Multiple applications & workflows
2 End-to-End Data Protection– SnapshotIQ, SyncIQ, NDMP Backup
3 Industry-Leading Storage Efficiency– >80% Storage Utilisation
4 Independent Scalability– Add compute & storage separately
5Multi-Protocol
– Industry standard protocols– NFS, CIFS, FTP, HTTP, HDFS
23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Hadoop Starter Kit
• Support for major Hadoopdistributions
• Quickly deploy, manage, and scale Hadoop clusters
• GUI simplifies management tasks
• Elastic scaling optimizes cluster performance and resource utilisation
Consolidate And Virtualized Hadoop With EMC Isilon And VMware
HDFS
NameNodeNameNodeDataData
name node
name node
name node
name node data node
Apache
https://community.emc.com/docs/DOC-26892
24© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
www.emc.com/getisilon
25© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Example Deployment With Pivotal HD• Pre-requisites
– Isilon OneFS version 6.5.5 or higher
– VMware vSphere 5.0 (or later) Enterprise or Enterprise Plus
• Download VMware Big Data Extensions (Free)
• Configure Isilon cluster for HDFS (Free license)
• Configure Big Data Extensions to use Pivotal HD
• Deploy Hadoop Cluster
• Run a simple program to test
26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data Lake Hadoop Bundle For New Customers
Free Hadoop For Existing Customers
HDFS
Pre-tested and configured Big Data analytics solution
Isilon X410 cluster with native HDFS
Free Pivotal HD licenses for 20 compute nodes
HAWQ parallel SQL licenses for 20 compute nodes
FEATURES
Gain powerful analytics capabilities quickly and easily
Reduce costs with highly efficient scale-out storage platform
Ability to leverage expert training and consulting services
Global 24x7 service and support
BENEFITS
Free HDFS license
Free community trial editions of Pivotal HD or Cloudera CHD
Free step-by-step Hadoop Starter Kit with simple directions
Free personalized TCO tool for Hadoop on Isilon vs. DAS
FEATURES
Simple, easy way to unlock value of unstructured data
Jump start Hadoop analytics initiatives quickly
Highly informative and easy-to-use tools
Understand TCO between alternative infrastructure strategies
BENEFITS
Big Data Analytics Solution
27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Splunk Enterprise
“Collect and index any machine-generated data from virtually any source or location in real time. Just point Splunk Enterprise at your data and it will immediately start collecting and indexing--so you can start searching and analysing”
www.splunk.com
28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
EMC Splunk Scale Out SimplySplunk Servers Scale OutClustered or Distributed DesignScale out by Blade
XtremIO – HOT/WARM BucketsUp to 20TB XbrickScale Out by Xbrick
Isilon – COLD BucketsUp to 3 x 144TB NodesScale Out by Node
Scale Out Based on CPU Requirements
Scale Out Based on Ingestion Rates
Scale Out Based on Long Term Retention
29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
http://www.emc.com/campaign/isilon-hadoop/index.htm
top related