dredge overview

Post on 15-Aug-2015

226 Views

Category:

Documents

6 Downloads

Preview:

Click to see full reader

TRANSCRIPT

H E M A L G A N D H ID I R E C T O R O F DATA E N G I N E E R I N G

DATA ENGINEERING AT ONE KINGS LANE

Powering business decisions through understanding

customer behavior.

Observations on Data Platforms

DREAM

You start with a simple design...

REALITY

...but you end up with a complex design.

DREAM

You start with full speed...

REALITY

...but you end up being slow.

DREAM

You start with the latest technology...

REALITY

...but end up with old stack before going live.

DREAM

You dream of a low cost platform...

REALITY

... but you end up shelling a lot of $$.

To build a scalable, loosely coupled big data platform.

WHAT IS OUR GOAL

Some design questions we need to answer:

DESIGN

Which technologies to choose? How to keep the stack current?

How to keep up with evolving business needs?

How to make your investment count?

It’s like building a city.

Technology

ProcessPeople

Technology

ProcessPeople

High Level Architecture

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

DATA PLATFORM

SCHEDULING & CLUSTER MONITORING

DATA PLATFORM

SE

CU

RIT

Y

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

APPLICATIONS & VISUALIZATION TOOLS

SCHEDULING & CLUSTER MONITORING

DATA PLATFORM

SE

CU

RIT

Y

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

DATA ACCESS ABSTRACTION API

SCHEDULING & CLUSTER MONITORING

DATA QUALITY SERVICE

DATA PLATFORM

APPLICATIONS & VISUALIZATION TOOLS

SE

CU

RIT

Y

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

DATA ACCESS ABSTRACTION API

SCHEDULING & CLUSTER MONITORING

DATA QUALITY SERVICE

DREDGE

SE

CU

RIT

Y

DATA PLATFORM

APPLICATIONS & VISUALIZATION TOOLS

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

WHAT IS DREDGE

A declarative, abstraction layer for integrating big data

tools, enabling loosely coupled big data platform.

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

TARGET ENDPOINTS

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

TARGET ENDPOINTS

LOG STREAMING

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

TARGET ENDPOINTS

LOG STREAMINGEVENTS

MANAGEMENT

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

TARGET ENDPOINTS

LOG STREAMINGEVENTS

MANAGEMENT

CONFIGURATION

ABSTRACTION

SOURCE END POINTS

LOG STREAMINGEVENTS

MANAGEMENT

CONFIGURATION

ABSTRACTION

TARGET ENDPOINTS

DREDGE LOGICAL VIEW

DREDGE REPOSITORY – HBASE

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

DREDGE ARCHITECTURE

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

DREDGE DATA SERVICES

DREDGE ARCHITECTURE

ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )

PLUGIN (JAVA/SHELL , P IG, SQL )

RANK, SORTER

AGGREGATOR

UDF’S

SET OPERATIONS

COMBINERS, ROUTERS. .

F ILTERS/PATTERNS ANALYSIS

SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM

TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

DREDGE RUNTIME

DREDGE DATA SERVICES

DREDGE ARCHITECTURE

TEMP STORE - HDFS TEMP STORE - HDFSEVENT

MANAGEMENT

ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )

PLUGIN (JAVA/SHELL , P IG, SQL )

RANK, SORTER

AGGREGATOR

UDF’S

SET OPERATIONS

COMBINERS, ROUTERS. .

F ILTERS/PATTERNS ANALYSIS

SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM

TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM

LOGGERSTREAM

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

DREDGE RUNTIME

DREDGE UI

Declarative configuration

Logical Flows

Data Lineage

Runtime Logs

Admin

DREDGE DATA SERVICES

DREDGE ARCHITECTURE

TEMP STORE - HDFS TEMP STORE - HDFSEVENT

MANAGEMENT

ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )

PLUGIN (JAVA/SHELL , P IG, SQL )

RANK, SORTER

AGGREGATOR

UDF’S

SET OPERATIONS

COMBINERS, ROUTERS. .

F ILTERS/PATTERNS ANALYSIS

SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM

TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM

LOGGERSTREAM

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

DREDGE RUNTIME

DREDGE UI

Declarative configuration

Logical Flows

Data Lineage

Runtime Logs

Admin

DREDGE DATA SERVICES

DREDGE ARCHITECTURE

DREDGE REPOSITORY – HBASE

TEMP STORE - HDFS TEMP STORE - HDFSEVENT

MANAGEMENT

ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )

PLUGIN (JAVA/SHELL , P IG, SQL )

RANK, SORTER

AGGREGATOR

UDF’S

SET OPERATIONS

COMBINERS, ROUTERS. .

F ILTERS/PATTERNS ANALYSIS

SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM

TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM

LOGGERSTREAM

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

Closing the Loop

Abstraction layer

Abstraction layer

Reusable data components

Abstraction layer

Reusable data components

Event Driven dependencies

Abstraction layer

Reusable data components

Event Driven dependencies

Plug n Play integration, loosely coupled (Cluster Resources, Data)

Summarizing

Big data requires a different mindset: Innovate, iterate often and

keep it simple.

E N G I N E E R I N G . O N E K I N G S L A N E . C O M

Thank you.

C O N T R I B U T O R S :

Maria Latushkin (CTO, One Kings Lane)

Joana Koiller (Senior Product Designer, One Kings Lane)

top related