stephan ewen flink committer co-founder / cto @ data artisans @stephanewen apache flink

Stephan EwenFlink committer

co-founder / CTO @ data Artisans

@StephanEwen

ApacheFlink

Looking back one year

2

April 16, 2014

3

Stratosphere 0.4

4

Stratosphere Optimizer

Pact API (Java)

Stratosphere Runtime

DataSet API (Scala)

Local Remote

Batch processing on a pipelining engine, with iterations …

Looking at now…

5

Flink

Historic data

Kafka, RabbitMQ, ...

HDFS, JDBC, ...

ETL, Graphs,Machine LearningRelational, …

Low latency,windowing, aggregations, ...

Event logs

Real-time data streams

What is Apache Flink?

(master)

What is Apache Flink?

7

Pyth

on

Gelly

Table

ML

SA

MO

A

Flink Optimizer

DataSet (Java/Scala)DataStream (Java/Scala)

Stream Builder Hadoo

p M

/R

Local Remote Yarn Tez Embedded

Data

flo

w

Data

flo

w

Flink Dataflow Runtime

HDFS

HBase

Kafka

RabbitMQ

Flume

HCatalog

JDBC

8

Batch / Steaming APIs

case class Word (word: String, frequency: Int)

val lines: DataStream[String] = env.fromSocketStream(...)

lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Count.of(1000)).every(Count.of(100)) .groupBy("word").sum("frequency") .print()

val lines: DataSet[String] = env.readTextFile(...)

lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print()

DataSet API (batch):

DataStream API (streaming):

Technology inside Flinkcase class Path (from: Long, to: Long)val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }

Cost-based optimizer

Type extraction

stack

Task schedulin

g

Recovery metadata

Pre-flight (Client)

MasterWorkers

DataSource

orders.tbl

Filter

MapDataSour

celineitem.tbl

JoinHybrid Hash

buildHT

probe

hash-part [0] hash-part [0]

GroupRed

sort

forward

Program

DataflowGraph

Memory manager

Out-of-core algos

Batch & Streaming

State & Checkpoint

s

deployoperators

trackintermediate

results

Flink by Feature / Use Case

10

Data Streaming Analysis

11

Life of data streams

Create: create streams from event sources (machines, databases, logs, sensors, …)

Collect: collect and make streams available for consumption (e.g., Apache Kafka)

Process: process streams, possibly generating derived streams (e.g., Apache Flink)

12

Stream Analysis in Flink

13More at: http://flink.apache.org/news/2015/02/09/streaming-example.html

14

Defining windows in Flink

Trigger policy• When to trigger the computation on current window

Eviction policy• When data points should leave the window• Defines window width/size

E.g., count-based policy• evict when #elements > n• start a new window every n-th element

Built-in: Count, Time, Delta policies

Checkpointing / Recovery

Flink acknowledges batches of records• Less overhead in failure-free case• Currently tied to fault tolerant data

sources (e.g., Kafka)

Flink operators can keep state• State is checkpointed• Checkpointing and record acks go together

Exactly one semantics for state

15

Checkpointing / Recovery

16

Chandy-Lamport Algorithm for consistent asynchronous distributed snapshots

Pushes checkpoint barriersthrough the data flow

Operator checkpointstarting

Checkpoint done

Data Stream

barrier

Before barrier =part of the snapshot

After barrier =Not in snapshot

Checkpoint done

checkpoint in progress

(backup till next snapshot)

Heavy ETL Pipelines

17

18

Heavy Data Pipelines

Complex ETL programs

Apology: Graph had to be blurred foronline slides, due to confidentiality

19

Memory Management

public class WC { public String word; public int count;}

emptypage

Pool of Memory Pages

Sorting, hashing, caching

Shuffling, broadcasts

User code objects

Managed

Unm

anaged

Flink contains its own memory management stack. Memory is allocated, de-allocated, and used strictly using an internal buffer pool implementation. To do that, Flink contains its own type extraction and serialization components.

More at: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525

Smooth out-of-core performance

20More at: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html

Single-core join of 1KB Java objects beyond memory (4 GB)Blue bars are in-memory, orange bars (partially) out-of-core

21

Benefits of managed memory

More reliable and stable performance (less GC effects, easy to go to disk)

Table API

22

val customers = envreadCsvFile(…).as('id, 'mktSegment) .filter( 'mktSegment === "AUTOMOBILE" )

val orders = env.readCsvFile(…) .filter( o => dateFormat.parse(o.orderDate).before(date) ) .as('orderId, 'custId, 'orderDate, 'shipPrio)

val items = orders .join(customers).where('custId === 'id) .join(lineitems).where('orderId === 'id) .select('orderId,'orderDate,'shipPrio, 'extdPrice * (Literal(1.0f) - 'discount) as 'revenue)

val result = items .groupBy('orderId, 'orderDate, 'shipPrio) .select('orderId, 'revenue.sum, 'orderDate, 'shipPrio)

Iterations in Data Flows Machine Learning Algorithms

23

24

Iterate by looping

for/while loop in client submits one job per iteration step

Data reuse by caching in memory and/or disk

Step Step Step Step Step

Client

25

Iterate in the Dataflow

Large-Scale Machine Learning

26

Factorizing a matrix with28 billion ratings forrecommendations

(Scale of Netflixor Spotify)

More at: http://data-artisans.com/computing-recommendations-with-flink.html

State in Iterations Graphs and Machine Learning

27

28

Iterate natively with deltas

partial solution

delta set X

other datasets

Y initial

solution iteration

result

workset A B workset

Merge deltas

Replace

initial workset

Effect of delta iterations…#

of

ele

men

ts u

pd

ate

d

iteration

30

… very fast graph analysis

… and mix and matchETL-style and graph analysisin one program

Performance competitivewith dedicated graph

analysis systems

More at: http://data-artisans.com/data-analysis-with-flink.html

Closing

31

32

Flink Roadmap for 2015

Out-of-core state in Streaming

Monitoring and scaling for streaming

Streaming Machine Learning with SAMOA

More additions to the libraries• Batch Machine Learning• Graph library additions (more algorithms)

SQL on top of expression language

Master failover

Flink community

Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13Nov-13 Jun-14 Dec-14 Jul-150

20

40

60

80

100

120#unique contributor ids by git commits

flink.apache.org@ApacheFlink

35

Backup

Cornerpoints of Flink Design

36

Robust Algorithms on Managed Memory

Pipelined Executionof Batch Programs

Better shuffle performance

No OutOfMemory Errors Scales to very large JVMs Efficient an robust processing

Flexible DataStreaming Engine

Low Latency Steam Proc.

Highly flexible windows

Native Iterations Very fast Graph

Processing Stateful Iterations for ML

High-level APIs,beyond key/value pairs

Java/Scala/Python (upcoming)

Relational-style optimizer

Graphs / Machine Learning

Streaming ML (coming)

Scales to very large groups

Active Library Development

37

Program optimization

38

A simple program

val orders = … val lineitems = …

val filteredOrders = orders .filter(o => dataFormat.parse(l.shipDate).after(date)) .filter(o => o.shipPrio > 2)

val lineitemsOfOrders = filteredOrders .join(lineitems) .where(“orderId”).equalTo(“orderId”) .apply((o,l) => new SelectedItem(o.orderDate, l.extdPrice))

val priceSums = lineitemsOfOrders .groupBy(“orderDate”).sum(“l.extdPrice”);

39

Two execution plans

DataSourceorders.tbl

Filter

Map DataSourcelineitem.tbl

JoinHybrid Hash

buildHT probe

broadcast forward

Combine

GroupRed

sort

DataSourceorders.tbl

Filter

Map DataSourcelineitem.tbl

JoinHybrid Hash

buildHT probe

hash-part [0] hash-part [0]

hash-part [0,1]

GroupRed

sort

forwardBest plan depends on

relative sizes of

input files

40

Examples of optimization

Task chaining• Coalesce map/filter/etc tasks

Join optimizations• Broadcast/partition, build/probe side, hash or sort-

merge

Interesting properties• Re-use partitioning and sorting for later operations

Automatic caching• E.g., for iterations

41

Visualization

42

Visualization tools

43

Visualization tools

44

Visualization tools

stephan ewen flink committer co-founder / cto @ data artisans @stephanewen apache flink

Documents