dredge overview

49
HEMAL GANDHI DIRECTOR OF DATA ENGINEERING

Upload: hemal-gandhi

Post on 15-Aug-2015

226 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: Dredge Overview

H E M A L G A N D H ID I R E C T O R O F DATA E N G I N E E R I N G

Page 2: Dredge Overview

DATA ENGINEERING AT ONE KINGS LANE

Powering business decisions through understanding

customer behavior.

Page 3: Dredge Overview

Observations on Data Platforms

Page 4: Dredge Overview

DREAM

You start with a simple design...

Page 5: Dredge Overview

REALITY

...but you end up with a complex design.

Page 6: Dredge Overview

DREAM

You start with full speed...

Page 7: Dredge Overview

REALITY

...but you end up being slow.

Page 8: Dredge Overview

DREAM

You start with the latest technology...

Page 9: Dredge Overview

REALITY

...but end up with old stack before going live.

Page 10: Dredge Overview

DREAM

You dream of a low cost platform...

Page 11: Dredge Overview

REALITY

... but you end up shelling a lot of $$.

Page 12: Dredge Overview

To build a scalable, loosely coupled big data platform.

WHAT IS OUR GOAL

Page 13: Dredge Overview

Some design questions we need to answer:

DESIGN

Page 14: Dredge Overview

Which technologies to choose? How to keep the stack current?

Page 15: Dredge Overview

How to keep up with evolving business needs?

Page 16: Dredge Overview

How to make your investment count?

Page 17: Dredge Overview

It’s like building a city.

Page 18: Dredge Overview

Technology

ProcessPeople

Page 19: Dredge Overview

Technology

ProcessPeople

Page 20: Dredge Overview

High Level Architecture

Page 21: Dredge Overview

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

DATA PLATFORM

Page 22: Dredge Overview

SCHEDULING & CLUSTER MONITORING

DATA PLATFORM

SE

CU

RIT

Y

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

Page 23: Dredge Overview

APPLICATIONS & VISUALIZATION TOOLS

SCHEDULING & CLUSTER MONITORING

DATA PLATFORM

SE

CU

RIT

Y

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

Page 24: Dredge Overview

DATA ACCESS ABSTRACTION API

SCHEDULING & CLUSTER MONITORING

DATA QUALITY SERVICE

DATA PLATFORM

APPLICATIONS & VISUALIZATION TOOLS

SE

CU

RIT

Y

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

Page 25: Dredge Overview

DATA ACCESS ABSTRACTION API

SCHEDULING & CLUSTER MONITORING

DATA QUALITY SERVICE

DREDGE

SE

CU

RIT

Y

DATA PLATFORM

APPLICATIONS & VISUALIZATION TOOLS

COLLECTION

- Apache Flume

- Sqoop

FLOW

- Kafka

- Spark

STORAGE

- HBase

- Hive

PROCESSING

- Pig

- Spark

DELIVERY

- Visualization

- Email / FTP

Page 26: Dredge Overview

WHAT IS DREDGE

A declarative, abstraction layer for integrating big data

tools, enabling loosely coupled big data platform.

Page 27: Dredge Overview

SOURCE END POINTS

DREDGE LOGICAL VIEW

Page 28: Dredge Overview

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

Page 29: Dredge Overview

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

Page 30: Dredge Overview

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

Page 31: Dredge Overview

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

TARGET ENDPOINTS

Page 32: Dredge Overview

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

TARGET ENDPOINTS

LOG STREAMING

Page 33: Dredge Overview

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

TARGET ENDPOINTS

LOG STREAMINGEVENTS

MANAGEMENT

Page 34: Dredge Overview

SOURCE END POINTS

DREDGE LOGICAL VIEW

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

TARGET ENDPOINTS

LOG STREAMINGEVENTS

MANAGEMENT

CONFIGURATION

ABSTRACTION

Page 35: Dredge Overview

SOURCE END POINTS

LOG STREAMINGEVENTS

MANAGEMENT

CONFIGURATION

ABSTRACTION

TARGET ENDPOINTS

DREDGE LOGICAL VIEW

DREDGE REPOSITORY – HBASE

SOURCE READERS

TASKSHADOOP CLUSTER

TARGET WRITERSSTREAM/DIRECT

Page 36: Dredge Overview

DREDGE ARCHITECTURE

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

Page 37: Dredge Overview

DREDGE DATA SERVICES

DREDGE ARCHITECTURE

ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )

PLUGIN (JAVA/SHELL , P IG, SQL )

RANK, SORTER

AGGREGATOR

UDF’S

SET OPERATIONS

COMBINERS, ROUTERS. .

F ILTERS/PATTERNS ANALYSIS

SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM

TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

Page 38: Dredge Overview

DREDGE RUNTIME

DREDGE DATA SERVICES

DREDGE ARCHITECTURE

TEMP STORE - HDFS TEMP STORE - HDFSEVENT

MANAGEMENT

ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )

PLUGIN (JAVA/SHELL , P IG, SQL )

RANK, SORTER

AGGREGATOR

UDF’S

SET OPERATIONS

COMBINERS, ROUTERS. .

F ILTERS/PATTERNS ANALYSIS

SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM

TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM

LOGGERSTREAM

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

Page 39: Dredge Overview

DREDGE RUNTIME

DREDGE UI

Declarative configuration

Logical Flows

Data Lineage

Runtime Logs

Admin

DREDGE DATA SERVICES

DREDGE ARCHITECTURE

TEMP STORE - HDFS TEMP STORE - HDFSEVENT

MANAGEMENT

ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )

PLUGIN (JAVA/SHELL , P IG, SQL )

RANK, SORTER

AGGREGATOR

UDF’S

SET OPERATIONS

COMBINERS, ROUTERS. .

F ILTERS/PATTERNS ANALYSIS

SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM

TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM

LOGGERSTREAM

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

Page 40: Dredge Overview

DREDGE RUNTIME

DREDGE UI

Declarative configuration

Logical Flows

Data Lineage

Runtime Logs

Admin

DREDGE DATA SERVICES

DREDGE ARCHITECTURE

DREDGE REPOSITORY – HBASE

TEMP STORE - HDFS TEMP STORE - HDFSEVENT

MANAGEMENT

ABSTRACTION BUILDER (KAFKA, FLUME, P IG, CUSTOM )

PLUGIN (JAVA/SHELL , P IG, SQL )

RANK, SORTER

AGGREGATOR

UDF’S

SET OPERATIONS

COMBINERS, ROUTERS. .

F ILTERS/PATTERNS ANALYSIS

SOURCE READERS (LOGS, RDBMS, UNSTRUCTURED DATA, CUSTOM ) D IRECT/STREAM

TARGET WRITERS (HIVE, HBASE, RDBMS, CUSTOM )DIRECT/STREAM

LOGGERSTREAM

LAMDA ARCHITECTURE : HDFS, HIVE, HBASE, P IG, FLUME, KAFKA, OOZIE

Page 41: Dredge Overview

Closing the Loop

Page 42: Dredge Overview

Abstraction layer

Page 43: Dredge Overview

Abstraction layer

Reusable data components

Page 44: Dredge Overview

Abstraction layer

Reusable data components

Event Driven dependencies

Page 45: Dredge Overview

Abstraction layer

Reusable data components

Event Driven dependencies

Plug n Play integration, loosely coupled (Cluster Resources, Data)

Page 46: Dredge Overview

Summarizing

Page 47: Dredge Overview

Big data requires a different mindset: Innovate, iterate often and

keep it simple.

Page 48: Dredge Overview

E N G I N E E R I N G . O N E K I N G S L A N E . C O M

Thank you.

Page 49: Dredge Overview

C O N T R I B U T O R S :

Maria Latushkin (CTO, One Kings Lane)

Joana Koiller (Senior Product Designer, One Kings Lane)