© copyright 2015 emc corporation. all rights reserved. 1 · pdf filegemfire app streaming...

22
© Copyright 2015 EMC Corporation. All rights reserved. 1 © Copyright 2015 EMC Corporation. All rights reserved.

Upload: doandung

Post on 08-Mar-2018

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved. 1 © Copyright 2015 EMC Corporation. All rights reserved.

Page 2: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

EMC REDEFINING BIG DATA ALEXANDER ERMAKOV - PIVOTAL

2 © Copyright 2015 EMC Corporation. All rights reserved.

Page 3: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Traditional Data Architecture

Front End

DBMS DBMS DBMS DBMS DBMS …

Front End

Front End

Front End

Front End

ETL

DWH

BI Data Minin

g OLAP …

100ms

3 sec

1 day

3-4 days

The path from end users

to business decisions

takes 1 day minimum

and 3-4 days typically

Page 4: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Advanced Data Architecture – ELT

DBMS DBMS DBMS …

ETL

DDS

Data Marts

Reports

Aggregates

OLAP

DBMS DBMS DBMS …

ELT

DDS

Data Marts

Reports

Aggregates

OLAP

ODS ODS ODS …

ELT arisen 10 years ago

Driven by

Storage cost reduction

Introduction of MPP

Pushdown

optimization in ETL

tools

Page 5: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Modern Data Architecture – Data Lake Concept was introduced 4 years ago by James Dixon

Data Lake Idea: integrate Hadoop solution into typical

enterprise architecture to improve customer analytics

capabilities

Usually Data Lake consists of the following approaches – Using Hadoop for storing and processing of unstructured data

– Using Hadoop as a staging platform for all the input data and using it

for storing all the source data loaded into the customer platform

– Historical data offload to Hadoop and using it as a cold data storage

Page 6: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Modern Data Architecture – Data Lake

Hadoop

DBMS DBMS DBMS …

ELT

DDS

OLAP Data Marts

Aggregates

Reports

ODS ODS ODS …

CDC

DWH

ODS UDS

Analytical Archives

BI Data Minin

g OLAP

SQL-on-Hadoop

Data Mining At Scale

Page 7: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Modern Data Architecture – Lambda Lambda Architecture introduced by Nathan Marz 2 years ago

Goal is to build a robust scalable fault-tolerant data

processing architecture, that is easily extensible and requires

minimal maintenance

Combines both near real time data processing and batch

processing into a single data processing approach

Based on the functional approach:

query = function(all data)

Page 8: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Modern Data Architecture – Lambda Source data is loaded to both

Speed and Batch layers

Master Dataset is maintained in

Batch Layer and contains all the

raw input data and is a basis for

any recalculation needed in the

system

Speed layer handles only small

part of the latest data, discarding

all the older data entries

Query merges the results from

both Batch and Speed layer

Source Data

Speed Layer

Batch Layer

Serving Layer

Query Query

Master Dataset

Batch View

Batch View

Batch View

Real-time View

Real-time View

Real-time View

Page 9: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Modern Data Architecture – Streaming The design is based on event stream processing

Uses message queue as the main data hub

Was born by reactive programming introduction. Emerged

with introduction of Spark Streaming, Storm and Samza

Don’t mix with “real-time processing” – Not just a webservice and RPC – no “response” exists in this design

– Not necessarily real-time: save the stream and reprocess it on demand

– Event stream processing instead of batch extraction of the data

– Using the same event stream for both OLTP and OLAP systems

Page 10: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Modern Data Architecture – Streaming FE

BI

App App

App

… HTTP

BE

Srv

Srv

Srv

… SOAP

OLTP

SP JDBC

Log

Table

CDC

copy Parse

Batch

ETL

cp Batc

h

ETL load

OD

S

DD

S

Data

M

art

DWH

JDBC

Page 11: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Modern Data Architecture – Streaming FE

BI

App App

App

… HTTP

BE

Srv

Srv

Srv

… SOAP

OLTP

SP JDBC

Log

Table

CDC

copy Parse

Batch

ETL

cp Batc

h

ETL load

OD

S

DD

S

Data

M

art

DWH

JDBC

FE BI

App App

App

… HTTP

BE

Srv

Srv

Srv

… SOAP

OLTP

SP JDBC Tabl

e

JDBC

ETL

ETL

OD

S

DD

S

Data

M

art

DWH

Queue

Batch

App

STG

Batch

App

Hadoop

RTI

App

HDFS

SQL On

Hadoop

ES

Introducing Queue

Page 12: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

BI

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Pivotal GemFire

App

Spring XD

Streaming

Streaming

Data

Pivotal HD

Pivotal

HAWQ

ES

DD

S

Data

M

art

Pivotal Greenplum

Data Mart OLTP

SP Tabl

e

OD

S

ETL

ETL

Page 13: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

BI

Pivotal GemFire

App

Spring XD

Streaming

Streaming

Data

Pivotal HD

Pivotal

HAWQ

ES

DD

S

Data

M

art

Pivotal Greenplum

Data Mart OLTP

SP Tabl

e

OD

S

ETL

ETL

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Pivotal Labs – agile software

development for next-generation

applications

Pivotal Cloud Foundry – PaaS

for customer applications

RabbitMQ – distributed message

queue service on top of PCF

Spring IO – foundation platform

for modern applications

Page 14: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

BI

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Spring XD

Streaming

Streaming

Data

Pivotal HD

Pivotal

HAWQ

ES

DD

S

Data

M

art

Pivotal Greenplum

Data Mart OLTP

SP Tabl

e

OD

S

ETL

ETL

Pivotal GemFire

App

Pivotal GemFire – in-memory data grid enabling real-time

data processing and real-time decision making for enterprises

Page 15: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

BI

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Pivotal GemFire

App

Streaming

Data

Pivotal HD

Pivotal

HAWQ

ES

DD

S

Data

M

art

Pivotal Greenplum

Data Mart OLTP

SP Tabl

e

OD

S

ETL

ETL

Spring XD

Streaming

Spring XD – unified, distributed and extensible framework for

data pipelining: ingesting, batching, processing and exporting

Page 16: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

BI

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Pivotal GemFire

App

Spring XD

Streaming

ES

DD

S

Data

M

art

Pivotal Greenplum

OLTP

SP Tabl

e

OD

S

ETL

ETL

Streaming

Data

Pivotal HD

Pivotal

HAWQ Data Mart

Pivotal HD – leading Hadoop distribution based on ODP

Pivotal HAWQ – bringing the power of MPP to the Hadoop

cluster, best in class SQL-on-Hadoop solution

Apache Spark – component of the Pivotal HD distribution,

modern framework for distributed data processing

Page 17: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

BI

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Pivotal GemFire

App

Spring XD

Streaming

Streaming

Data

Pivotal HD

Pivotal

HAWQ Data Mart OLTP

SP Tabl

e

ETL

ETL

ES

DD

S

Data

M

art

Pivotal Greenplum

OD

S

Pivotal Greenplum – leading analytical MPP database,

foundation for the enterprise data warehousing systems and

advanced analytics

Page 18: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Pivotal and Modern Data Architecture Pivotal

GemFire

App

Spring XD

Streaming

Data Lake

BI

Streaming

Data

Pivotal HD

Pivotal

HAWQ Data Mart OLTP

SP Tabl

e

ETL

ETL

ES

DD

S

Data

M

art

OD

S

Queue BE

App

App

App

Pivotal Greenplum

Page 19: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Spring XD

Streaming

ES

DD

S

Data

M

art

Pivotal Greenplum

OLTP

SP Tabl

e

OD

S

ETL

ETL

Pivotal GemFire

App

Streaming

Data

Pivotal HD

Pivotal

HAWQ Data Mart

BI

Lambda Architecture

Page 20: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

OLTP

SP Tabl

e

ETL

ETL

ES

DD

S

Data

M

art

Pivotal Greenplum

OD

S

Streaming

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Pivotal GemFire

App

Spring XD

Streaming

Streaming

Data

Pivotal

HAWQ Data Mart

BI

Pivotal HD

Page 21: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota

© Copyright 2015 EMC Corporation. All rights reserved.

Pivotal and Modern Data Architecture

BI

Pivotal Cloud Foundry

HTTP

FE

App

App

App

Queue BE

App

App

App

Pivotal GemFire

App

Spring XD

Streaming

Streaming

Data

Pivotal HD

Pivotal

HAWQ

ES

DD

S

Data

M

art

Pivotal Greenplum

Data Mart OLTP

SP Tabl

e

OD

S

ETL

ETL

Page 22: © Copyright 2015 EMC Corporation. All rights reserved. 1 · PDF fileGemFire App Streaming ES DDS a Mart Pivotal Greenplum OLTP SP Tabl e ODS ETL ETL Streaming Data Pivotal HD Pivota