creating an enterprise-class hadoop platform › sites › default › orig › abds2012 › ... ·...
TRANSCRIPT
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Creating an Enterprise-class Hadoop Platform
Joey Jablonski Practice Director, Analytic Services
DataDirect Networks, Inc. (DDN)
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Who am I?
Practice Director, Analytic Services at DataDirect Networks, Inc.
3+ years with Hadoop, 12+ with HPC Contact Details @jrjablo [email protected]/[email protected] www.linkedin.com/in/joeyjablonski
2
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Why Hadoop?
Scalable – Performance & Capacity Growing Ecosystem (Flexibility) Established APIs & Interfaces Location on the adoption curve Proven base to create Analytical Platforms
3
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
What is Enterprise Class?
Scalable – OPEX & CAPEX Manageable Integration with existing tools Flexible Workflow – Process Integration No Rip & Replace Metrics to manage towards Business Driven, Technological Capabilities
4
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
The Big Data Challenge
The Big Data Equation:
Volume Velocity Variety + +
Petabytes of Data Trillions of Objects
GB/s TB/s Millions of IO/s
Object Operations
Structured Unstructured
Streams & Batches
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Analytics | Looking for Actionable Information
Billions of Data
Points to Consider
• Consumer purchasing trends • Product perception • Drug Discovery • Genomics • Surveillance • Financial Analysis
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Data Gravity
7
DATA
Services
Applications
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Why is data Analytics so hard?
Hacking Skills
Substantive Expertise
Math & Statistics
knowledge Trad
ition
al
Res
earc
h
DataScience
Business Acumen
CuriosityCommunications
Analytics
Poor D
ecisioning
Technical Business
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
What is Hadoop missing today?
Active-Active high-availability Established management tools Enterprise integration mindset Enterprise class hardware Consistent version-compatibility & deployment Efficient CAPEX & OPEX scaling Resource management/SLAs/QoS Security.
9
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Hadoop Operational Considerations
Deploy
Manage
Monitor Respond
Upgrade
Software Platform Hardware Platform
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Todays Enterprise Picture
11
The Cloud
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Getting there….
Improved Results
Modify Behavior Insight
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Hadoop Architectural Considerations
13
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Planning for Growth
14
Adop
tion
Hig
her i
s B
ette
r
Capacity
Goal for Human Costs
Performance Scalability User Growth
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Shared v. Commodity
15
Shared Component Approach • Lower Operational Costs • Efficient operational resource
scaling • Shared resources with other IT
platforms • Efficiency in computing,
connectivity & service placement
Commodity Server Approach • Lower Entry Costs • Shorter MTBF • Inefficient scaling of tools and
processes • Mis-match with traditional IT
operations models
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Ethernet v. Infiniband
16
Infiniband • 100% Storage Management Offload • End-End InfiniBand Networking with RDMA
Acceleration • Real-Time Data Delivery to Provide
MapReduce Process Consistency • Smaller Compute, Compact Storage to
Minimize Data Center Impact
Ethernet • Compatibility, ensured connectivity • Limitations in traffic types and bandwidth
availability • High CPU/Overhead cost • Minimal options for offloading with Linux
environments
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Analytic User Types
17
Empowered Users Aware Users Enabled Users
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Hadoop Enterprise Integration
18
Extract Transform Load
Data Information Insight Results
APIs
Integration
Monitoring & Response
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
And finally, Hadoop is…
…more then just hardware, It is about an ecosystem of hardware &
software. …about integrating with existing systems. …a toolkit to build Analytical Platforms. …a component of the larger corporate
processes and mandates. …a component of the wider business KPIs.
19
2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.
Q&A
20