creating an enterprise-class hadoop platform › sites › default › orig › abds2012 › ... ·...

Post on 28-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Creating an Enterprise-class Hadoop Platform

Joey Jablonski Practice Director, Analytic Services

DataDirect Networks, Inc. (DDN)

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Who am I?

Practice Director, Analytic Services at DataDirect Networks, Inc.

3+ years with Hadoop, 12+ with HPC Contact Details @jrjablo jjablonski@ddn.com/jrjablo@gmail.com www.linkedin.com/in/joeyjablonski

2

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Why Hadoop?

Scalable – Performance & Capacity Growing Ecosystem (Flexibility) Established APIs & Interfaces Location on the adoption curve Proven base to create Analytical Platforms

3

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

What is Enterprise Class?

Scalable – OPEX & CAPEX Manageable Integration with existing tools Flexible Workflow – Process Integration No Rip & Replace Metrics to manage towards Business Driven, Technological Capabilities

4

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

The Big Data Challenge

The Big Data Equation:

Volume Velocity Variety + +

Petabytes of Data Trillions of Objects

GB/s TB/s Millions of IO/s

Object Operations

Structured Unstructured

Streams & Batches

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Analytics | Looking for Actionable Information

Billions of Data

Points to Consider

• Consumer purchasing trends • Product perception • Drug Discovery • Genomics • Surveillance • Financial Analysis

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Data Gravity

7

DATA

Services

Applications

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Why is data Analytics so hard?

Hacking Skills

Substantive Expertise

Math & Statistics

knowledge Trad

ition

al

Res

earc

h

DataScience

Business Acumen

CuriosityCommunications

Analytics

Poor D

ecisioning

Technical Business

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

What is Hadoop missing today?

Active-Active high-availability Established management tools Enterprise integration mindset Enterprise class hardware Consistent version-compatibility & deployment Efficient CAPEX & OPEX scaling Resource management/SLAs/QoS Security.

9

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Hadoop Operational Considerations

Deploy

Manage

Monitor Respond

Upgrade

Software Platform Hardware Platform

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Todays Enterprise Picture

11

The Cloud

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Getting there….

Improved Results

Modify Behavior Insight

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Hadoop Architectural Considerations

13

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Planning for Growth

14

Adop

tion

Hig

her i

s B

ette

r

Capacity

Goal for Human Costs

Performance Scalability User Growth

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Shared v. Commodity

15

Shared Component Approach • Lower Operational Costs • Efficient operational resource

scaling • Shared resources with other IT

platforms • Efficiency in computing,

connectivity & service placement

Commodity Server Approach • Lower Entry Costs • Shorter MTBF • Inefficient scaling of tools and

processes • Mis-match with traditional IT

operations models

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Ethernet v. Infiniband

16

Infiniband • 100% Storage Management Offload • End-End InfiniBand Networking with RDMA

Acceleration • Real-Time Data Delivery to Provide

MapReduce Process Consistency • Smaller Compute, Compact Storage to

Minimize Data Center Impact

Ethernet • Compatibility, ensured connectivity • Limitations in traffic types and bandwidth

availability • High CPU/Overhead cost • Minimal options for offloading with Linux

environments

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Analytic User Types

17

Empowered Users Aware Users Enabled Users

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Hadoop Enterprise Integration

18

Extract Transform Load

Data Information Insight Results

APIs

Integration

Monitoring & Response

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

And finally, Hadoop is…

…more then just hardware, It is about an ecosystem of hardware &

software. …about integrating with existing systems. …a toolkit to build Analytical Platforms. …a component of the larger corporate

processes and mandates. …a component of the wider business KPIs.

19

2012 SNIA Analytics and Big Data Summit. © DataDirect Networks, Inc. (DDN). All Rights Reserved.

Q&A

20

top related