expect more from hadoop

32
1 ©MapR Technologies Expect More from Hadoop Jack Norris, MapR Technologies

Upload: mapr-technologies

Post on 13-Jan-2015

355 views

Category:

Technology


1 download

DESCRIPTION

MapR Technologies Chief Marketing Officer, Jack Norris, talks about the advantages of Hadoop. He elaborates and multiple use cases and explains how MapR Technologies is the best Hadoop distribution.

TRANSCRIPT

Page 1: Expect More from Hadoop

1©MapR Technologies

Expect More from HadoopJack Norris, MapR Technologies

Page 2: Expect More from Hadoop

3©MapR Technologies

Hadoop Growth

Page 3: Expect More from Hadoop

4©MapR Technologies

Important Drivers for Hadoop

Data on compute

You don’t need to know what questions to ask beforehand

Simple algorithms on Big Data

Analysis of unstructured data

Page 4: Expect More from Hadoop

5©MapR Technologies

The Cost of Enterprise Storage

SAN Storage

$2 - $10/Gigabyte

$1M gets:0.5Petabytes 200,000 IOPS

1Gbyte/sec

NAS Filers

$1 - $5/Gigabyte

$1M gets:1 Petabyte

400,000 IOPS2Gbyte/sec

Local Storage

$0.02/Gigabyte

$1M gets:50 Petabytes

10,000,000 IOPS800 Gbytes/sec

1/100 to 1/20 the cost

Page 5: Expect More from Hadoop

6©MapR Technologies

MapReduce: A Paradigm Shift

Distributed, scalable computing platform– Data/Compute framework– Commodity hardware

Pioneered at Google

Commercially available as Hadoop

Page 6: Expect More from Hadoop

7©MapR Technologies

MapR Distribution for Apache Hadoop

Complete Hadoop distribution

Comprehensive management suite

Industry-standard interfaces

Enterprise-grade dependability

Higher performance

Pig

Hive

HBase

Mahout

Oozie

Whirr

Avro

Cascading

Nagios

Ganglia

MapR Control System

MapR Data Platform

MapR Control System

MapR Data Platform

Flume

Sqoop

HCatalog

Zookeeper

Drill

Map

Reduc

e

Page 7: Expect More from Hadoop

8©MapR Technologies

How do you Benefit?

Page 8: Expect More from Hadoop

9©MapR Technologies

Expanding data for existing applications

Page 9: Expect More from Hadoop

10©MapR Technologies

Use Case #1

Major telecom vendor

Key step in billing pipeline handled by data warehouse (EDW)

EDW at maximum capacity

Multiple rounds of software optimization already done

Revenue limiting (= career limiting) bottleneck

Page 10: Expect More from Hadoop

11©MapR Technologies

TransformationExtract and Load

CDR billing records

Billing reports

Data Warehouse

Customer bills

Original Flow

Page 11: Expect More from Hadoop

12©MapR Technologies

Problem Analysis

70% of EDW load is related to call detail record (CDR) normalization

–< 10% of total lines of code–CDR normalization difficult within the EDW–Binary extraction and conversion

Data rates are too high for upstream transform

–Requires high volume joins

Page 12: Expect More from Hadoop

13©MapR Technologies

ETLCDR billing

records

Billing reports

Data Warehouse

Customer billing

With ETL Offload

Hadoop Cluster

Page 13: Expect More from Hadoop

15©MapR Technologies

Simplified Analysis

70% of EDW consumed by ETL processing – Offload frees capacity

EDW direct hardware cost is approximately $30 million vs. Hadoop cluster at 1/50 the cost

Additional EDW only increases capacity by 50% due to poor division of labor

Page 14: Expect More from Hadoop

17©MapR Technologies

The Results

EDW strategy–1.5 x performance–$30 million

MapR Strategy–3 x faster–20x cost/performance advantage for MapR strategy–With High Availability and data protection

Page 15: Expect More from Hadoop

19©MapR Technologies

Use Case #2

Combine Many Different Data Sources

Page 16: Expect More from Hadoop

20©MapR Technologies

Use Case #2 – Customer Example

Global Credit Card Issuer

Launching a New Location Based Service

Benefits both Merchants and Consumers

Page 17: Expect More from Hadoop

21©MapR Technologies

Combining different feeds on one platform

Hadoop and HBase Storage and Processing

Real-time data feed from social network

Stored in Hadoop

Historical Purchase Information

Predictive Analytics from Historical data combined with NoSQL querying on real-time

social networking data

Billing Data

Page 18: Expect More from Hadoop

22©MapR Technologies

Results

New Service Rolled out in 1 quarter

Processing time cut from 20 hours per day to 3

Recommendation engine load time decreased from 8 hours to 3 minutes

Includes data versioning support for easier development and updating of models

Page 19: Expect More from Hadoop

25©MapR Technologies

Use Case #3

New Application from New Data Source

Page 20: Expect More from Hadoop

26©MapR Technologies

Ancestry.com – Family Tree

Page 21: Expect More from Hadoop

27©MapR Technologies

Overview and Requirements

Collect and Collate information from disparate sources (Text files, Images, etc.)

Leverage new data source: Spit

Machine learning techniques and DNA Matching Algorithms

Page 22: Expect More from Hadoop

28©MapR Technologies

The Results

Storage Infrastructure for billions of small and large files

Blob Store for large images through NoSQL solutions

Multi-tenant capability for data-mining and machine-learning algorithm development

One highly available, efficient platform

Page 23: Expect More from Hadoop

29©MapR Technologies

MapR M7: Making HBase Enterprise Grade

Disks

ext3

JVM

DFS

JVM

HBase

Other Distributions

Disks

Unified

Easy Dependable Fast

No RegionServers No compactions Consistent low latency

Seamless splits Instant recovery from node failure

Real-time in-memory configuration

Automatic merges Snapshots Disk and network compression

In-memory column families Mirroring Reduced I/O to disk

Page 24: Expect More from Hadoop

30©MapR Technologies

Use Case

New Analytics on Existing Data

Page 25: Expect More from Hadoop

31©MapR Technologies

Analytic Flexibility

MapReduce enabled Machine learning algorithms

Enhanced Search

Real-time event processing

No need to sample the data

Fraud Detection Target Marketing Consumer Behavior Analysis …

Page 26: Expect More from Hadoop

32©MapR Technologies

Hadoop Expands Analytics

“Simple algorithms and lots of data trump complex models ”

Halevy, Norvig, and Pereira, GoogleIEEE Intelligent Systems

Page 27: Expect More from Hadoop

34©MapR Technologies

Use Case #4

Combine All Three

Page 28: Expect More from Hadoop

35©MapR Technologies

Where do you Start?

Page 29: Expect More from Hadoop

36©MapR Technologies

One Platform for Big Data

Batch

99.999% HA

Data Protection

Disaster Recovery

Scalability &

Performance

Enterprise Integration

Multi-tenancy

BatchProcessing

File-Based Applications SQL Database Search Stream

Processing

Interactive Realtime

Page 30: Expect More from Hadoop

37©MapR Technologies

World Record Performance

Why is MapR faster and more efficient?– C/C++ vs. Java – Distributed metadata– Optimized shuffle

New Minute Sort World Record

1.5 TB in 1 minute2103 nodes

Page 31: Expect More from Hadoop

38©MapR Technologies

Thank You

Page 32: Expect More from Hadoop

39©MapR Technologies