key considerations for putting hadoop in production slideshare

37
© 2015 MapR Technologies 1 © 2015 MapR Technologies

Upload: mapr-technologies

Post on 24-Jul-2015

112 views

Category:

Documents


0 download

TRANSCRIPT

© 2015 MapR Technologies 1 © 2015 MapR Technologies

© 2015 MapR Technologies 2

• The most common use cases for Hadoop

• The top considerations before "going live" with Hadoop

• Product Demo – multiple workloads in the Data Lake

Topics

© 2015 MapR Technologies 3

State of Big Data Adoption

Source: Gartner. Sept. 2014. Survey Analysis: Big Data Investment Grows but Deployments Remain Scarce in 2014

© 2015 MapR Technologies 4 © 2015 MapR Technologies

Top Hadoop Use Cases

© 2015 MapR Technologies 5

Speeding The Journey To Value

Operational

Batch

Create Data Capital

Big data novice Mature

Empower BI users

Operational

Applications

Mine

Logs

Recommendation

Engine Data

Hub

Ad

Targeting 360

View

Anomaly

detection

Fraud

prevention Get fast value

© 2015 MapR Technologies 6

The As-it-happens Business

Speeding The Journey To Value

Operational

Batch

Create Data Capital

Big data novice Mature

Empower BI users

Operational

Applications

Mine

Logs

Recommendation

Engine Data

Hub

Ad

Targeting 360

View

Anomaly

detection

Fraud

prevention Get fast value

© 2015 MapR Technologies 7

ENTERPRISE

DATA HUB

MARKETING

OPTIMIZATION

RISK & SECURITY

OPTIMIZATION

OPERATIONAL

INTELLIGENCE

• Multi-structured

data staging & archive

• ETL / DW optimization

• Mainframe

optimization

• Data exploration

• Recommendation

engines & targeting

• Customer 360

• Click-stream analysis

• Social media analysis

• Ad optimization

• Network security

monitoring

• Security information &

event management

• Fraudulent behavioral

analysis

• Supply chain & logistics

• System log analysis

• Manufacturing quality

assurance

• Preventative

maintenance

• Smart meter analysis

Common Use Cases: Taking Advantage of Hadoop

© 2015 MapR Technologies 8

Hadoop Use Cases by Industry HEALTHCARE & LIFE SCIENCES

GOVERNMENT ADVERTISING, MEDIA & ENTERTAINMENT

• Improved ad targeting, analysis,

forecasting and optimization

• Personalized recommendations

• Superior analytics capability

• Enhanced game player engagement

FINANCIAL SERVICES

• Fraud Detection

• Customer Segmentation Analysis

• Customer Sentiment Analysis

• Risk Aggregation

• Counterparty Risk Analytics

• New Products and Services for

Consumer Card Holders

• Credit Risk Assessment

• 360-Degree Customer Service

• Cybersecurity, Intelligence

• Crime Prediction and Prevention

• Defense, National Security

• Pharmaceutical Drug Evaluation

• Scientific Research

• Weather Forecasting

• Fraud Detection

• Emergency Communications/Response

• Traffic Optimization

TELECOM MANUFACTURING OIL & GAS RETAIL

• Personalized Treatment Planning

• Assisted Diagnosis

• Fraud Detection

• Monitor Patient Vital Signs

• Assembly Line Quality Assurance

• Preventive Maintenance

• Supply Chain and Logistics

• Monitoring Product Quality through

Telemetry Data

• Real-time Parts Flow Monitoring

• Product Configuration Planning

• Market Pricing and Planning

• Oil Exploration and Discovery

• New oil prospect identification

• Seismic trace identification

• Oil Production

• Equipment Maintenance

• Reservoir Engineering

• Safety and Environment

• Security

• Up-Sell/Cross-Sell Recommendations

• Social Media Analysis

• Dynamic Pricing Across Multiple

Channels

• Fraud Detection

• Clickstream Analysis

• Loyalty Program Benefits

• 360° Customer View

• Operational Intelligence

• Customer Churn Analysis

• Fraud Detection

• Clickstream Analysis

• Recommendations

• Product Development

• Network Management/Optimization

© 2015 MapR Technologies 9

900B WORLDWIDE

BILLS

$

DATA STORED

10Years 100M+ CARDS

45s TERASORT

1.65TB MINUTESORT

Offer Serving,

Credit Risk & Fraud

<

Largest deployment

in financial services

1700+

SAVED FOR

CARDHOLDERS

$100M

MapR Hadoop nodes

FIN SERVICES

GOAL:

© 2015 MapR Technologies 10

Operations + Analytics = Real-time, Personalized Services

Fraud model Recommendations

table

MapR Distribution including Hadoop

Fraud

investigator

Interactive

marketer

Online

transactions

Fraud

detection

Personalized

offers

Clickstream

analysis

Fraud

investigation tool

Real-time Operational Applications

Analytics

Customer

Support

© 2015 MapR Technologies 11

Hadoop + Data Warehouse Architecture Improve data services to customers without increasing enterprise architecture costs

• Provide cloud, security, managed services, data center, & comms

• Report on customer usage, profiles, billing, and sales metrics

• Improve service: Measure service quality and repair metrics

• Reduce customer churn – identify and address IP network hotspots

• Cost of ETL & DW storage for growing IP and clickstream data; >3 months

• Reliability & cost of Hadoop alternatives limited ETL & storage offload

• MapR for data staging, ETL, and storage at 1/10th the cost

• MapR provided smallest datacenter footprint with best DR solution

• Enterprise-grade: NFS file management, consistent snapshots & mirroring

• Data warehouse for mission-critical reporting and analysis

OBJECTIVES

CHALLENGES

SOLUTION

Hadoop + Data Warehouse = New, Deeper Insights for the Business

• Increased scale to handle network IP and clickstream data

• Freed up processing on DW to maintain reporting SLA’s to business

• Unlocked new insights into network usage and customer preferences

Business Impact

FORTUNE 500

TELCO

© 2015 MapR Technologies 12

MapR Optimized Data Architecture

Sources

RELATIONAL,

SAAS,

MAINFRAME

DOCUMENTS,

EMAILS

LOG FILES,

CLICKSTREAMS

SENSORS

BLOGS,

TWEETS,

LINK DATA

DATA WAREHOUSE

Data Movement

Data Access

Analytics

Search

Schema-less

data exploration

BI, reporting

Ad-hoc integrated

analytics

Data Transformation, Enrichment

and Integration

MAPR DISTRIBUTION FOR HADOOP

Streaming (Spark Streaming,

Storm)

NoSQL ODBMS

(HBase, Accumulo, …)

MapR Data Platform

MapR-DB

MAPR DISTRIBUTION FOR HADOOP

Batch/Search (MR, Spark, Hive, Pig)

MapR-FS

Operational Apps

Recommendations

Fraud Detection

Logistics

Optimized Data Architecture Machine Learning

Interactive (Impala, Drill)

© 2015 MapR Technologies 13

Bullet-proof data vault that meets SEC and FINRA requirements

46x cost savings over legacy system

Efficiency of MapR cluster that can store the Elasticsearch index for real-time search

Security Log Analysis & Enterprise Data Vault F100 bank accelerates log analytics to meet investigation and compliance mandates

• Meet compliance requirements to minimize lawsuits and fines

• Complete IT audits more quickly

• Prior system (flat files on Unix) was difficult to maintain for operations team

• HA and data protection issues in HDFS put critical data at risk

• File volume (300K files/day) was straining system

• Seamless Hadoop file movement & management: MapR NFS

• MapReduce enables archival of data for historical search and analysis

• Data is indexed into Elasticsearch from MapR for real-time search

• Customizable user interface and dashboard: Kibana (ELK stack)

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

LARGE FINANCIAL SERVICES INSTITUTION

© 2015 MapR Technologies 14 © 2015 MapR Technologies

Planning for Production Success with Hadoop

© 2015 MapR Technologies 15

Key Questions for

Big Data Planning

Source: Gartner. Jan 2015. Answering Big Data's 10 Biggest Planning and Implementation Questions

© 2015 MapR Technologies 16

Big Data is Overwhelming Traditional Systems

• Mission-critical reliability

• Transaction guarantees

• Deep security

• Real-time performance

• Backup and recovery

• Interactive SQL

• Rich analytics

• Workload management

• Data governance

• Backup and recovery

Enterprise Data

Architecture

TREND

ENTERPRISE USERS

OPERATIONAL SYSTEMS

ANALYTICAL SYSTEMS

PRODUCTION REQUIREMENTS

PRODUCTION REQUIREMENTS

OUTSIDE SOURCES

© 2015 MapR Technologies 17

OPERATIONAL SYSTEMS

ANALYTICAL SYSTEMS

ENTERPRISE USERS

REALITY

• Data staging

• Archive

• Data transformation

• Data exploration

• Streaming,

interactions

Hadoop Relieves the Pressure from Enterprise Systems

2 Interoperability

1 Business continuity

4 Multi-tenacy

3 High performance

Keys for Production Success

© 2015 MapR Technologies 18

Key Reasons for Selecting the MapR Distribution including Hadoop Respondents who have had prior experience with another Hadoop distribution*

* Apache Hadoop, Cloudera or Hortonworks

© 2015 MapR Technologies 19

Business Continuity

High Availability

Data Protection

Disaster Recovery

What are your requirements?

What do you have for your enterprise storage,

databases and data warehouses?

© 2015 MapR Technologies 20

Seamless Integration with Direct Access NFS

• POSIX compliant – Random reads/writes

– Simultaneous reading and writing to a file

– Compression is automatic and transparent

• Industry-standard NFS interface (in addition to HDFS API)

– Stream data into the cluster

– Leverage thousands of tools and applications

– Easier to use non-Java programming languages

– No need for most proprietary Hadoop connectors

• Compression/parallel access/security from edge nodes to MapR cluster

© 2015 MapR Technologies 21

Narrow Foundations – Big and Fast are Separate

HDFS

Map/

Reduce HBase

Spark /

Storm Hive

RDBMS NAS

Sequential File

Processing OLAP

Data

Mining

WEB SERVICES

Big Data is

heavy and

expensive

to move

© 2015 MapR Technologies 22

Unify Big & Fast on One Platform

HDFS

Map

Reduce HBase

Spark /

Storm Hive

RDBMS NAS

Sequential File

Processing OLAP

Data

Mining

WEB SERVICES

NEXT GENERATION DISTRIBUTION HADOOP API’S NFS

© 2015 MapR Technologies 23 © 2015 MapR Technologies

What Makes MapR Different

© 2015 MapR Technologies 24

MapR: Best Solution for Customer Success

Premier

Investors High Growth

2X Growth In Direct Customers

90% Subscription Licenses

Software Margins

140% Dollar-based Net Expansion

700+ Customers

2X Growth In Annual

Subscriptions ( ACV)

Best Product

Apache Open Source

© 2015 MapR Technologies 25

The Power of the Open Source Community

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Spark Streaming

Storm

Streaming NoSQL & Search

Juju

Provisioning &

Coordination

Sahara

ML, Graph

Mahout

MLLib

GraphX

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow & Data

Governance

Pig

Cascading

Spark

Batch

MapReduce v1 & v2

Tez

HBase

Solr

Hive

Impala

Spark SQL

Drill

SQL

Sentry Oozie ZooKeeper Sqoop

Flume

Data Integration & Access

HttpFS

Hue

Data Platform MapR-FS MapR-DB

Manag

em

ent

© 2015 MapR Technologies 26

The MapR Distribution including Apache Hadoop

APACHE HADOOP AND OSS ECOSYSTEM

Security

YARN

Spark Streaming

Storm

Streaming NoSQL & Search

Juju

Provisioning &

Coordination

Sahara

ML, Graph

Mahout

MLLib

GraphX

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow & Data

Governance

Pig

Cascading

Spark

Batch

MapReduce v1 & v2

Tez

HBase

Solr

Hive

Impala

Spark SQL

Drill

SQL

Sentry Oozie ZooKeeper Sqoop

Flume

Data Integration & Access

HttpFS

Hue

Data Platform MapR-FS MapR-DB

Manag

em

ent

Data Hub Enterprise Grade Operational

© 2015 MapR Technologies 27

MapR Distribution including Hadoop

Theme Requirements Features Product

Enterprise Grade

• Uptime service levels

• Site to site DR

• Backup/recovery

• Security

• High velocity data ingress

• HW/SW HA

• Mirroring

• Snapshots

• Authorization, Kerberos

• 2X-5X performance

MapR

Enterprise Edition

Data Hub

• Hadoop

• Traditional applications

• Data of record

• Batch and interactive

• HDFS

• POSIX

• Strong consistency

• MapReduce and SQL

MapR

Enterprise Edition

Operational

• Real time

• NoSQL

• Operational analytics

• HBase

• Update in place

• Concurrent read/write

MapR

Enterprise Database Edition

MapR Patent Pending – “Table Format for Map Reduce”

“Map Reduce Ready Distributed File System”

Enterprise Grade

Operational

Data Hub

© 2015 MapR Technologies 28

Achievements: Triple Crown Of Analyst Ranking

© 2015 MapR Technologies 29

Apache Hadoop NameNode High Availability

NameNode

A B C D E F

HDFS-based Distributions

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

Primary NameNode

A B C D E F

Standby NameNode

A B C D E F

NameNode

A B

NameNode

C D

NameNode

E F

NameNode

A B

NameNode

C D

NameNode

E F

HDFS HA HDFS

Federation

Single point of failure

Limited to 50-200 million files

Performance bottleneck

Metadata must fit in memory

Only one active NameNode

Limited to 50-200 million files

Performance bottleneck

Metadata must fit in memory

Double the block reports

Multiple single points

of failure w/o HA

Needs 20 NameNodes

for 1 Billion files

Performance bottleneck

Metadata must fit in memory

Double the block reports

© 2015 MapR Technologies 30

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

No-NameNode Architecture

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

DataNode

NameNode

A B C D E F A A A B B B B C C C D D D E E E F F F

Up to 1T files (> 5000x advantage)

Significantly less hardware & OpEx

Higher performance

No special config to enable HA

Automatic failover & re-replication

Metadata is persisted to disk

© 2015 MapR Technologies 31

© 2015 MapR Technologies 33

MapR: Fast and Dependable with Lowest TCO

Cost comparison for a 500 TB cluster vs HDFS-based distro’s

TCO: mapr.com/tco

© 2015 MapR Technologies 34 © 2015 MapR Technologies

Product Demo: Multi-tenancy

© 2015 MapR Technologies 35

Committed to our Customers’ Success

Educational Services Professional Services Customer Support

Core

Hadoop

Services

Data

Engineering

Advanced

Analytics

M7/HBase

Practice

Hadoop engineering experts provide

24x7x365

global coverage

Instructor-led courses &

Free On-Demand training for Hadoop cluster

administration, HBase &

MapReduce programming

and more

Data

Engineering

Data

Science

© 2015 MapR Technologies 36

WORLDWIDE PRESENCE &

CUSTOMER SUPPORT

HQ

© 2015 MapR Technologies 37

Key MapR Advantage Partners Business Services

INFRASTRUCTURE

& CLOUD

ANALYTICS &

BUSINESS INTELLIGENCE

APPLICATIONS

& OS

CONSULTANTS

& INTEGRATORS

DATA WAREHOUSE

& INTEGRATION

© 2015 MapR Technologies 38

Q & A

@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies

GET STARTED NOW! mapr.com/sandbox