apache flink training: system overview

Apache Flink® TrainingSystem Overview

2

A stream processor with many applications

Streaming dataflow runtime

Apache Flink

1 year of Flink - codeApril 2014 April 2015

Flink Community

Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13Nov-13 Jun-14 Dec-14 Jul-150

20

40

60

80

100

120 #unique contributor ids by git commits

In top 5 of Apache's big data projects after one year in the Apache Software Foundation

The Apache Way Flink is an Apache top-level

project

5

• Independent, non-profit organization• Community-driven open source software

development approach• Public communication and open to new

contributors

What is Apache Flink?

6

Gelly

Tabl

e

ML

SAM

OA

DataSet (Java/Scala/Python) DataStream (Java/Scala)

Hado

op M

/R

Local Remote Yarn Tez Embedded

Data

flow

Data

flow

(WiP

)

MRQ

L

Tabl

e

Casc

adin

g (W

iP)


Native workload support

7

Flink

Streaming topologies

Heavy batch jobs Machine Learning at

scale

How can an engine natively support all these workloads?And what does native mean?

E.g.: Non-native iterations

8

Step Step Step Step Step

Client

for (int i = 0; i < maxIterations; i++) {// Execute MapReduce job

}

E.g.: Non-native streaming

9

streamdiscretizer

Job Job Job Jobwhile (true) { // get next few records // issue batch job}

Native workload support

10

Flink

Stream processing

Batchprocessing Machine Learning at

scale

How can an engine natively support all these workloads?And what does "native" mean?

Graph Analysis

Flink Engine

1. Execute everything as streams

2. Iterative (cyclic) dataflows

3. Mutable state

4. Operate on managed memory

5. Special code paths for batch11

State + Computation

What is a Flink Program?

12

Gelly

Tabl

e

ML

SAM

OA


Hado

op M

/R


Data

flow

Data

flow

(WiP

)

MRQ

L

Tabl

e

Casc

adin

g (W

iP)


Flink stack

Basic API Concept

How do I write a Flink program?1. Bootstrap sources2. Apply operations3. Output to source

14

Data Stream Operation Data

StreamSource Sink

Data Set Operation Data

SetSource Sink

Batch & Stream Processing

DataStream API

15

Stock FeedName Price

Microsoft 124

Google 516

Apple 235

… …

Alert if Microsoft

> 120

Write event to database

Sum every 10

seconds

Alert if sum > 10000

Microsoft 124

Google 516Apple 235

Microsoft 124

Google 516

Apple 235

b h

2 1

3 5

7 4

… …

Map Reduce

a

12

…

DataSet APIExample: Map/Reduce paradigm

Example: Live Stock Feed

Streaming & Batch

16

Batch

finite

blocking or pipelined

high

Streaming

infinite

pipelined

low

Input

Data transfer

Latency

Scaling out

17


SetSource Sink


SetSource SinkData Set Operation Data






SetSource Sink

18

Scaling up

Sources (selection)Collection-based fromCollection fromElementsFile-based TextInputFormat CsvInputFormatOther SocketInputFormat KafkaInputFormat Databases

19

Sinks (selection)File-based TextOutputFormat CsvOutputFormat PrintOutputOthers SocketOutputFormat KafkaOutputFormat Databases

20

Hadoop IntegrationOut of the box Access HDFS Yarn Execution (covered later) Reuse data types (Writables)

With a thin wrapper Reuse Hadoop input and output formats Reuse functions like Map and Reduce

21

What’s the Lifecycle of a Program?

22

Architecture Overview Client Master (Job Manager) Worker (Task Manager)

24

Client

Job Manager

Task Manager

Task Manager

Task Manager

Client Optimize Construct job graph Pass job graph to job manager Retrieve job results

25

Job Manager

Client

case class Path (from: Long, to: Long)val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }

Optimizer

Type extraction

Data Sourceorders.tbl

Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0] hash-part [0]

GroupRedsort

forward

Job Manager Parallelization: Create Execution Graph Scheduling: Assign tasks to task managers State: Supervise the execution

26

Job Manager


Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0]

hash-part [0]

GroupRed

sort

forward

Task Manager

Task Manager

Task Manager

Task Manager


Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0]

hash-part [0]

GroupRed

sort

forward


Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0]

hash-part [0]

GroupRed

sort

forward


FilterMap DataSou

rcelineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0]

hash-part [0]

GroupRed

sort

forward


Filter

Map DataSource

lineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0]

hash-part [0]

GroupRed

sort

forward

Task Manager Operations are split up into tasks

depending on the specified parallelism Each parallel instance of an operation runs

in a separate task slot The scheduler may run several tasks from

different operators in one task slot

27

Task Manager

Slot

Task ManagerTask Manager

Slot

Slot

Execution Setups

28

Ways to Run a Flink Program

29

Gelly

Tabl

e

ML

SAM

OA


Hado

op M

/R


Data

flow

Data

flow

(WiP

)

MRQ

L

Tabl

e

Casc

adin

g (W

iP)


Local Execution Starts local Flink

cluster All processes run in

the same JVM Behaves just like a

regular Cluster Very useful for

developing and debugging

30

Job Manager

Task Manager

Task Manager

Task Manager

Task Manager

JVM

Embedded Execution Runs operators on simple Java

collections Lower overhead Does not use memory management Useful for testing and debugging

31

Remote Execution

Submit a Job remotely

Monitor the status of a job

32

Client Job Manager

Cluster

Task Manager

Task Manager

Task Manager

Task Manager

Submit job

YARN Execution Multi-user

scenario Resource sharing Uses YARN

containers to run a Flink cluster

Easy to setup

33

Client

Node Manager

Job Manager

YARN Cluster

Resource Manager

Node Manager

Task Manager

Node Manager

Task Manager

Node Manager

Other Application

Execution Leverages Apache Tez’s runtime Built on top of YARN Good YARN citizen Fast path to elastic deployments Slower than native Flink

34

Flink compared to other projects

35

Batch & Streaming projectsBatch only

Streaming only

Hybrid

36

Batch comparison

37

API low-level high-level high-level

Data Transfer batch batch pipelined & batch

Memory Management disk-based JVM-managed Active managed

Iterations file systemcached

in-memory cached streamed

Fault tolerance task level task level job level

Good at massive scale out data exploration heavy backend & iterative jobs

Libraries many external built-in & external evolving built-in & external

Streaming comparison

38

Streaming “true” mini batches “true”

API low-level high-level high-level

Fault tolerance tuple-level ACKs RDD-based (lineage) coarse checkpointing

State not built-in external internal

Exactly once at least once exactly once exactly once

Windowing not built-in restricted flexible

Latency low medium low

Throughput medium high high

Thank you for listening!

39

apache flink training: system overview

Technology