aljoscha krettek - the future of apache flink

34
Aljoscha Krettek [email protected] @aljoscha The Future of Apache Flink®

Upload: flink-forward

Post on 08-Jan-2017

151 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Aljoscha Krettek - The Future of Apache Flink

Aljoscha [email protected]@aljoscha

The Future of Apache Flink®

Page 2: Aljoscha Krettek - The Future of Apache Flink

Before We Start Approach me or anyone wearing a

commiter’s badge if you are interested in learning more about a feature/topic

Whoami: Apache Flink® PMC, Apache Beam (incubating) PMC, (self-proclaimed) streaming expert

2

Page 3: Aljoscha Krettek - The Future of Apache Flink

3

DisclaimerWhat I’m going to tell you are my views and opinions. I don’t control the roadmap of Apache Flink®, the community does. You can learn all of this by following the community and talking to people.

Page 4: Aljoscha Krettek - The Future of Apache Flink

Things We Will Cover

4

Operations

Stream API

State/Checkpointing

Job Elasticity

Incremental Checkpointing

Queryable State

Window Trigger DSL

Running Flink Everywhere

Security Enhancements

Failure Policies

Operator Inspection

Enhanced Window Meta Data

Side Inputs

Side Outputs Cluster Elasticity

Hot Standby

Stream SQL

Page 5: Aljoscha Krettek - The Future of Apache Flink

Varying Degrees of Readiness

foo• Stuff that is in the master branch*

foo• Things where the community already

has thorough plans for implementation foo• Ideas and sketches, not concrete

implementations

5* or really close to that 🤗

DONE

IN PROGRESS

DESIGN

Page 6: Aljoscha Krettek - The Future of Apache Flink

6

Stream API

Page 7: Aljoscha Krettek - The Future of Apache Flink

A Typical Streaming Use Case

7

DataStream<MyType> input = <my source>;input.keyBy(new MyKeyselector()) .window(TumblingEventTimeWindows.of(Time.hours(5))) .trigger(EventTimeTrigger.create()) .allowedLateness(Time.hours(1)) .apply(new MyWindowFunction()) .addSink(new MySink());

sink

win

src

key window assigner

trigger

allowed lateness

window function

Page 8: Aljoscha Krettek - The Future of Apache Flink

Window Trigger Decides when to process a

window Flink has built-in triggers:• EventTime• ProcessingTime• Count

For more complex behaviour you need to roll your own, i.e:

8

window assigner

trigger

allowed lateness

window function

“fire at window end but also every 5 minutes from start”

Page 9: Aljoscha Krettek - The Future of Apache Flink

Window Trigger DSL Library of combinable

trigger building blocks:• EventTime• ProcessingTime• Count• AfterAll(subtriggers)• AfterAny(subtriggers)• Repeat(subtrigger)

9

VS

EventTime.afterEndOfWindow().withEarlyTrigger(ProcessingTime.after(5))

DONE

Page 10: Aljoscha Krettek - The Future of Apache Flink

Enhanced Window Meta Data

Current WindowFunction:• No information about firing

New WindowFunction:

10

window assigner

trigger

allowed lateness

window function

(key, window, input) → output

(key, window, context, input) → output

context = (Firing Reason, Id, …)

IN PROGRESS

Page 11: Aljoscha Krettek - The Future of Apache Flink

Detour: Window Operator Window operator keeps track of

timers and state for window contents and triggers

Window results are made available when the trigger fires

11

window assigner

trigger

allowed lateness

window function

state

timers

window state

Page 12: Aljoscha Krettek - The Future of Apache Flink

Queryable State Flink-internal job

state is made queryable

Aggregations, windows, machine learning models

12

DONE

window assigner

trigger

allowed lateness

window functiontimers

Page 13: Aljoscha Krettek - The Future of Apache Flink

Enriching Computations Operations typically only have one

input What if we need to make calculations

not just based on the input events?

13

?sink

win

src

key

Page 14: Aljoscha Krettek - The Future of Apache Flink

Side Inputs Additional input for operators besides

the main input From a stream, from a data base or

from a computation result

14

IN PROGRESS

sink

win

src

key win

src2

key

Page 15: Aljoscha Krettek - The Future of Apache Flink

What Happens to Late Data?

By default events arriving after the allowed lateness are dropped

15

window assigner

trigger

allowed lateness

window function

sink

win

src

key

late data

Page 16: Aljoscha Krettek - The Future of Apache Flink

Side Outputs Selectively send output to different

downstream operators Not just useful for window operations

16

IN PROGRESS

sink

win

src

key

late data

op

sink

Page 17: Aljoscha Krettek - The Future of Apache Flink

Stream SQL

17

SELECT STREAM TUMBLE_START(tStamp, INTERVAL ‘5’ HOUR) AS hour, COUNT(*) AS cntFROM eventsWHERE status = ‘received’GROUP BY TUMBLE(tStamp, INTERVAL ‘5’ HOUR)

IN PROGRESS

Page 18: Aljoscha Krettek - The Future of Apache Flink

18

State/Checkpointing

Page 19: Aljoscha Krettek - The Future of Apache Flink

Checkpointing: Status Quo Saving the state of operators in case

of failures

19

Source

Flink Pipeline HDFS for Checkpoints

chk 1 chk 2

chk 3

Page 20: Aljoscha Krettek - The Future of Apache Flink

Incremental Checkpointing Only checkpoint changes to save on

network traffic/time

20

Source

Flink Pipeline HDFS for Checkpoints

chk 1 chk 2

chk 3

DESIGN

Page 21: Aljoscha Krettek - The Future of Apache Flink

Hot Standby Don’t require complete cluster

restart upon failure Replicate state to other

TaskManagers so that they can pick up work of failed TaskManagers

Keep data available for querying even when job fails

21

DESIGN

Page 22: Aljoscha Krettek - The Future of Apache Flink

Scaling to Super Large State Flink is already able to handle

hundreds of GBs of state smoothly

Incremental checkpointing and hot standby enable scaling to TBs of state without performance problems

22

Page 23: Aljoscha Krettek - The Future of Apache Flink

23

Operations

Page 24: Aljoscha Krettek - The Future of Apache Flink

24

Job Elasticity – Status Quo A Flink job is

started with a fixed amount of parallel operators

Data comes in, the operators work on it in parallel

win win

Page 25: Aljoscha Krettek - The Future of Apache Flink

25

Job Elasticity – Problem What happens

when you get to much input data?

Affects performance:• Backpressure• Latency• Throughput

win win

Page 26: Aljoscha Krettek - The Future of Apache Flink

26

Job Elasticity – Solution Dynamically scale

up/down the amount or worker nodes

DONE

win winwin

Page 27: Aljoscha Krettek - The Future of Apache Flink

27

IN PROGRESS

Running Flink Everywhere Native integration

with cluster management frameworks

Page 28: Aljoscha Krettek - The Future of Apache Flink

28

Cluster Elasticity Equivalent to Job

Elasticity on cluster side

Dynamic resource allocation from cluster manager 1

2

IN PROGRESS

Page 29: Aljoscha Krettek - The Future of Apache Flink

Security Enhancements Authentication to

external systems Over-the-wire

encryption for Flink and authorization at Flink Cluster

29

Kerberos

IN PROGRESS

Page 30: Aljoscha Krettek - The Future of Apache Flink

Failure Policies/Inspection Policies for

handling pipeline errors

Policies for handling checkpointing errors

Live inspection of the output of running operators in the pipeline

30

DESIGN

Page 31: Aljoscha Krettek - The Future of Apache Flink

31

Closing

Page 32: Aljoscha Krettek - The Future of Apache Flink

How to Learn More FLIP – Flink Improvement Proposals

32https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

Page 33: Aljoscha Krettek - The Future of Apache Flink

Recap The Flink API is already mature, some

refinements are coming up A lot of work is going on in making

day-to-day operations easy and making sure Flink scales to very large installations

Most of the changes are driven by user demand

33

Page 34: Aljoscha Krettek - The Future of Apache Flink

Enjoy the conference!