why spark on hadoop matters

Post on 24-Feb-2016

59 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Why Spark on Hadoop Matters. MC Srivas , CTO and Founder , MapR Technologies. Apache Spark Summit - July 1, 2014. MapR Overview. Top Ranked. 500+ Customers. Cloud Leaders. Exponential Growth. 3X. 80%. 90%. < 1%. bookings Q1 ‘13 – Q1 ‘14. of accounts expand 3X. software licenses. - PowerPoint PPT Presentation

TRANSCRIPT

© 2014 MapR Technologies 1© 2014 MapR Technologies

Why Spark on Hadoop Matters

MC Srivas, CTO and Founder, MapR TechnologiesApache Spark Summit - July 1, 2014

© 2014 MapR Technologies 2

MapR Overview

Top Ranked Exponential Growth

500+ Customers Cloud Leaders

3X bookings Q1 ‘13 – Q1 ‘14

80% of accounts expand 3X

90% software licenses

< 1% lifetime churn

> $1B in incremental revenuegenerated by 1 customer

© 2014 MapR Technologies 3

Rapidly Evolving LandscapeM

anag

emen

t

MapR Data Platform

APACHE HADOOP AND OSS ECOSYSTEMSecurity

YARN

PigCascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBaseSolr

NoSQL & Search

Juju

Provision

Savannah*

MahoutMLLib

ML, Graph

GraphX

MR v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow &

Data Gov.Tez*

Accumulo*

HiveImpalaSharkDrill*

SQL

Sentry* Oozie ZooKeeperSqoopKnox* WhirrFalcon*Flume

Data Integrtn.& Access

HttpFSHue

* 2014 TIMELINE

© 2014 MapR Technologies 4

The Complete Spark Stack on HadoopM

anag

emen

t

MapR Data Platform

APACHE HADOOP AND OSS ECOSYSTEMSecurity

YARN

Pig

Cascading

Spark

Batch

Spark Streaming

Storm*

Streaming

HBase

Solr

NoSQL & Search

Juju

Provision

Savannah*

Mahout

MLLib

ML, Graph

GraphX

MR v1 & v2

EXECUTION ENGINES DATA GOVERNANCE AND OPERATIONS

Workflow &

Data Gov.Tez*

Accumulo*

Hive

Impala

SharkDrill*

SQL

Sentry* Oozie ZooKeeperSqoopKnox* WhirrFalcon*Flume

Data Integrtn.& Access

HttpFSHue

* 2014 TIMELINE

© 2014 MapR Technologies 5

A Winning Combination

© 2014 MapR Technologies 6

Spark Advantages:

IN-MEMORY PERFORMANCE

EASE OF DEVELOPMENT

COMBINE WORKFLOWS

• Easier APIs• Python, Scala, Java

• RDDs• DAGs Unify Processing

• Shark, ML, Streaming, GraphX

© 2014 MapR Technologies 7

Hadoop Advantages:

UNLIMITEDSCALE

WIDE RANGE OF APPLICATIONS

ENTERPRISE PLATFORM

• Multiple data sources• Multiple applications• Multiple users

• Reliability• Multi-tenancy• Security

• Files• Databases• Semi-structured

© 2014 MapR Technologies 8

The Combination of Spark on Hadoop

IN-MEMORY PERFORMANCE

EASE OF DEVELOPMENT

COMBINE WORKFLOWS

UNLIMITEDSCALE

WIDE RANGE OF APPLICATIONS

ENTERPRISE PLATFORM

Operational ApplicationsAugmented by In-Memory Performance

© 2014 MapR Technologies 9© 2014 MapR Technologies

Case Studies

© 2014 MapR Technologies 10

Industry Leading Ad-Targeting Platform

• High performance analytics over MapR M7 NoSQL

• Load from M7 table into RDD to augment scoring in real-time

• Results fed back to M7 for other applications

© 2014 MapR Technologies 11

Leading Pharma Company: NextGen Genomics

Existing process takes several weeks to align chemical compounds with genes

ADAM on Spark allows

realignment in a few hours

Geneticists can minimize engineering dependency

© 2014 MapR Technologies 12

Cisco: Security Intelligence Operations

Sensor data lands in M7

Spark Streaming on M7 for first check on known threats

Data next processed on GraphX and Mahout

Results queried using SQL via Shark and Impala

© 2014 MapR Technologies 13

Insurance Giant: Addressing Health Care Regulations

Patient information in M7 combined with clinical records to compute re-admittance probability

Process uses Spark with transactional data in M7

Insurance options decided in real-time on online portals

© 2014 MapR Technologies 14© 2014 MapR Technologies

In Summary

© 2014 MapR Technologies 15

Spark on

Hadoop gains traction for Real-time applications

© 2014 MapR Technologies 16

Pick the Right Tool for the Job

© 2014 MapR Technologies 17

MapR is Unbiased Open Source (a la Linux)• Open source distribution is about providing choice

– Linux includes MySQL, PostgreSQL and SQLite– Linux includes Apache httpd, nginx and Lighttpd

MapR Distribution for Hadoop Distribution C Distribution H

Spark Spark (all of it) and Shark Spark only No

Interactive SQL Shark, Impala, Drill, Hive/Tez One option(Impala)

One option(Hive/Tez)

Versions Hive 0.10, 0.11, 0.12, 0.13Pig 0.11, 012HBase 0.94, 0.98

One version One version

© 2014 MapR Technologies 18

@mapr maprtech

srivas@mapr.com

Engage with us!

MapR

maprtech

mapr-technologies

Thank you

top related