deutsche telekom on big data

19
Deutsche Telekom Perspective on HADOOP and Big Data Technologies Gregory Smith VP Solution Design and Emerging Technologies and Architectures T-Systems North America [email protected]

Upload: hadoopsummit

Post on 27-Jan-2015

136 views

Category:

Technology


6 download

DESCRIPTION

Extracting value from Big Data is not easy. The field of technologies and vendors is fragmented and rapidly evolving. End-to-end, general purpose solutions that work out of the box don’t exist yet, and Hadoop is no exception. And most companies lack Big Data specialists. The key to unlocking real value lies with thinking smart and hard about the business requirements for a Big Data solution. There is a long list of crucial questions to think about. Is Hadoop really the best solution for all Big Data needs? Should companies run a Hadoop cluster on expensive enterprise-grade storage, or use cheap commodity servers? Should the chosen infrastructure be bare metal or virtualized? The picture becomes even more confusing at the analysis and visualization layer. The answer to Big Data ROI lies somewhere between the herd and nerd mentality. Thinking hard and being smart about each use case as early as possible avoids costly mistakes in choosing hardware and software. This talk will illustrate how Deutsche Telekom follows this segmentation approach to make sure every individual use case drives architecture design and the selection of technologies and vendors.

TRANSCRIPT

Page 1: Deutsche Telekom on Big Data

Deutsche Telekom Perspective on HADOOP and Big Data TechnologiesGregory SmithVP Solution Design and Emerging Technologies and ArchitecturesT-Systems North [email protected]

Page 2: Deutsche Telekom on Big Data

Deutsche Telekom and T-Systems Key Stats

Deutsche Telekom is Europe’s largest telecom service provider– Revenue: $75 billion– Employees: 232,342

T-Systems is the enterprise division of Deutsche Telekom – Revenue: $13 billion– Employees: 52,742– Services: data center, end user computing, networking, systems

integration, cloud and big data

2

Page 3: Deutsche Telekom on Big Data

Overwhelmed by new data types?

3

Sentiment data

Call detail records (CDRs)

Sensor- / machine-based data

Big DataTransactions, Interactions, Observations

Clickstream data

Page 4: Deutsche Telekom on Big Data

80% of new data in 2015 will land on Hadoop!

4

Hadoop is like a data warehouse,but it can store more data, more kinds of data,

and perform more flexible analyses

Hadoop is open sourceand runs on industry standard hardware,

so it's 1-2 orders of magnitude more economicalthan conventional data warehouse solutions

Hadoop provides more cost effective storage, processing, and analysis. Some existing workloads run faster, cheaper, better

Hadoop can deliver a foundation for profitable growth:Gain value from all your data by asking bigger questions

Page 5: Deutsche Telekom on Big Data

5

Reference architecture view of Hadoop

Se

cu

rity

Op

era

tion

s

Infrastructure

Virtualization Compute / Storage / Network

Wo

rkflo

w a

nd

Sc

he

du

ling

M

an

ag

em

en

t an

d M

on

itorin

g

Da

ta Is

ola

tion

Ac

ce

ss

Ma

na

ge

me

nt

Da

ta E

nc

ryp

tion

Data Integration

Data Processing

Batch ProcessingReal Time/Stream

ProcessingSearch and Indexing

Application

Analytics Apps Transactional AppsAnalytics

Middleware

Presentation

Data Visualization and Reporting Clients

Real Time Ingestion

Batch Ingestion

Data Connectors

Metadata Services

Data Management

Distributed Processing

(MapReduce)

Non-relational DB

Structured In Memory

Distributed Storage(HDFS)

Hadoop Core

Hadoop Projects

Adjacent Categories

Page 6: Deutsche Telekom on Big Data

Example application landscape

ETL

Real TimeStreams

(Social,sensors)

Structured and Unstructured Data(HDFS, MAPR)

Real Time Database

(Shark, Gemfire, hBase,

Cassandra)

Interactive Analytics

(Impala,Greenplum,AsterData,Netezza…)

BatchProcessing(Map-Reduce)

Real-TimeProcessing

(s4, storm,spark) Data Visualization

(Excel, Tableau)

(Informatica, Talend, Spring Integration)

Compute Storage Networking

Cloud Infrastructure

HIVE

Machine Learning(Mahout, etc…)

Source: Vmware

Page 7: Deutsche Telekom on Big Data

Disruptive innovations in Big Data

7

TraditionalDatabase

HADOOPNoSQL

DatabaseMPP

AnalyticsData

Warehouse

SchemaPre-defined, fixedRequired on write

Required on readStore first, ask questions later

ProcessingNo or limited

data processing

Processing coupled with data Parallel processing / scale out

Data typesStructured Any, including unstructured

..

Physical infrastructure

Enterprise grade

Mission critical

Commodity is an optionMuch cheaper storage

Page 8: Deutsche Telekom on Big Data

Business problem

TechnologySolution

Legacy BI

Backward-looking analysis

Using data out of business applications

SAP Business Objects

IBM Cognos MicroStrategy

Structured Limited (2 – 3 TB in

RAM)

High Performance BI

Quasi-real-time analysis

Using data out of business applications

Oracle Exadata SAP HANA

Structured Limited (2 – 8 TB in

RAM)

“Hadoop” Ecosystem

Forward-looking predictive analysis

Questions defined in the moment, using data from many sources

Hadoop distributions No ACID transactions Limited SQL Set (joins)

Structured or unstructured

Unlimited (20 – 30 PB)

„True“ big dataLegacy vendor definition of big data

Selected Vendors

Data Type/Scalability

Innovations: Hadoop is 100x cheaper per TB than in-memory appliances like HANA and handles unstructured data as well

Page 9: Deutsche Telekom on Big Data

Innovations: Store first, ask questions later

9

SAN Storage3-5€/GB

Based on HDS SAN Storage

NAS Filers1-3€/GB

Based on Netapp FAS-Series

White Box DAS1)

0.50-1.00€/GB

Hardware can be self-assembled

Data Cloud1)

0.10-0.30€ /GB

Based on large scale object

storage interfaces

Enterprise ClassHadoop Storage

???€/GB

Based on Netapp E-Series (NOSH)

1) Hadoop offers Storage + Compute (incl. search). Data Cloud offers Amazon S3 and native storage functions

? !Illustrative acquisition cost

Much cheaper storagebut not just storage…

Page 10: Deutsche Telekom on Big Data

10

Target use cases

IT Infrastructure& Operations

Business Intelligence &

Data Warehousing

Line of Business &Business Analysts

CXO

Time to value

LongerShorterLower

Higher

Potential value

Lower Cost Storage

Enterprise Data Lake

Enterprise Data Warehouse Offload

Enterprise Data Warehouse Archive

ETL Offload

Capacity Planning & Utilization

Customer Profiling & Revenue Analytics

Targeted Advertising Analytics

Service Renewal Implementation

CDR based Data Analytics

Fraud Management

New Business Models

Cost effective storage, processing, and analysis

Foundation for profitable growth

Page 11: Deutsche Telekom on Big Data

Enterprise data warehouse offload use case

11

The Challenge

Many EDWs are at capacity Running out of budget before

running out of relevant data Older data archived “in the dark”,

not available for exploration

The Solution

Hadoop for data storage and processing: parse, cleanse, apply structure and transform

Free EDW for valuable queries Retain all data for analysis!

Operational (44%)

ETL Processing (42%)

Analytics (11%)

DATA WAREHOUSE

Storage & Processing

HADOOP

Operational (50%)

Analytics (50%)

DATA WAREHOUSE

Cost is 1/10th

Page 12: Deutsche Telekom on Big Data

GOAL: Platform that natively supports

mixed workloads as shared service

AVOID: Systems separated by workload

type due to contention

From data puddles and ponds to lakes and oceans

Page 12

Big Data

BU1

Big Data

BU2

Big Data

BU3

Big DataTransactions, Interactions, Observations

Refine Explore Enrich

Batch Interactive Online

Page 13: Deutsche Telekom on Big Data

13

Questions to ask in designing a solution for a particular business use case

Which distribution is right for your needs today vs. tomorrow? Which distribution will ensure you stay on the main path of

open source innovation, vs. trap you in proprietary forks?

Secu

rity

Op

eration

s

Infrastructure

Data Inte-gra-tion Data Processing

Application

Presentation

Data Management

Note: Distributions include more than just the Data Management layer but are discussed at this point in the presentation.Not shown: Intel, Fujitsu and other distributions

Widely adopted, mature distribution GTM partners include Oracle, HP, Dell, IBM

Fully open source distribution (incl. management tools) Reputation for cost-effective licensing Strong developer ecosystem momentum GTM partners include Microsoft, Teradata, Informatica, Talend

More proprietary distribution with features that appeal to some business critical use cases

GTM partner AWS (M3 and M5 versions only)

Just announced by EMC, very early stage Differentiator is HAWQ – claims manifold query speed

improvement, full SQL instruction set

Page 14: Deutsche Telekom on Big Data

Common objections to Hadoop

14

We don’t have big data problems

We don’t have petabytes of data

We can’t justify the budget for a

new project

We don’t havethe skills

We’re not sure Hadoop is

mature/secure/enterprise-ready

We already have a scale-out strategy for our EDW/ETL

Page 15: Deutsche Telekom on Big Data

15

MYTH: Big Data means “Petabytes”

Not just Volume Remember Variety, Velocity Plenty of issues at smaller scales

– Data processing– Unstructured data

Often warehouse volumes are small because the technology is expensive, not because there is no relevant data

Scalability is about growing with the business, affordably and predictably

Every organization has data problems!Hadoop can help…

MYTH: Big Data means Data Science

Hadoop solves existing problems faster, better, cheaper than conventional technology, e.g.– Landing zone – capturing and

refining multi-structured data types with unknown future value

– Cost effective platform for retaining lots of data for long periods of time

Walk before you run Big Data Is a State of Mind

Page 16: Deutsche Telekom on Big Data

Waves of adoption – crossing the chasm

16

Wave 1Batch Orientation

Wave 2Interactive Orientation

Wave 3Real-Time Orientation

Mainstream, 70% of organizations

Early adopters, 20% of organizations

Bleeding edge, 10% of organizations

Adoption today*

Refine:archival and transformation

Explore:query and visualization

Enrich: real-time decisions

Example use cases

Hour(s) Minutes SecondsResponse time

Volume VelocityData characteristic

EDW / RDBMS talk to Hadoop

Analytic apps talk directly to Hadoop

Derived data also stored in Hadoop

Architectural characteristic

MapReduce, Pig, Hive

ODBC/JDBC, Hive HBase, NoSQL, SQL

Example technologies

* Among organizations using Hadoop

Page 17: Deutsche Telekom on Big Data

Hadoop in a nutshell

The Hadoop open source ecosystem delivers powerful innovation in storage, databases and business intelligence, promising unprecedented price / performance compared  to existing technologies. 

Hadoop is becoming an enterprise-wide landing zone for large amounts of data.  Increasingly it is also used to transform data. 

Large enterprises have realized substantial cost reductions by offloading some enterprise data warehouse, ETL and archiving workloads to a Hadoop cluster. 

17

Page 18: Deutsche Telekom on Big Data

Challenges in the Enterprise

Use-case identification and cost justification Cooperation and coordination from independent business units As Hadoop increases its footprint in business-critical areas, the

business will demand mature enterprise capabilities, e.g. DR, snap-shots, etc.

Hadoop’s disruptive approve is challenging strong legacy EDW People, processes and technologies.

Data harmonization is often a significant challenge.  Fear of forking (think UNIX) Proprietary absorption (Borged) Audience: Hadoop address business problems, not IT problems Fear of data complexity (“I hated statistics class!”)

18

Page 19: Deutsche Telekom on Big Data

Questions?

[email protected]