stephan ewen - stream processing as a foundational paradigm and apache flink's approach to it

30
Stream Processing and Apache Flink®'s approach to it @StephanEwen Apache Flink PMC CTO @ data Artisans

Upload: dataartisans

Post on 13-Apr-2017

194 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Stream Processingand Apache Flink®'s approach to it@StephanEwen

Apache Flink PMCCTO @ data Artisans

Page 2: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

About meDatabase systems, TU Berlin, IBM, MicrosoftCo-bootstrapped Stratosphere project's runtimeApache Flink created from a (partial) Stratosphere forkApache Flink community founded data ArtisansNow Flink PMC and CTO at data Artisans

Page 3: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Streaming technology is enabling the obvious: continuous processing on data that is continuously produced

Hint: you already have streaming data

3

Page 4: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Streaming Subsumes Batch

4

2016-3-112:00 am

2016-3-11:00 am

2016-3-12:00 am

2016-3-1111:00pm

2016-3-1212:00am

2016-3-121:00am

2016-3-1110:00pm

2016-3-122:00am

2016-3-123:00am…

partition

partition

Page 5: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Streaming Subsumes Batch

5

2016-3-112:00 am

2016-3-11:00 am

2016-3-12:00 am

2016-3-1111:00pm

2016-3-1212:00am

2016-3-121:00am

2016-3-1110:00pm

2016-3-122:00am

2016-3-123:00am…

partition

partition

Stream (low latency)

Stream (high latency)

Page 6: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Streaming Subsumes Batch

6

2016-3-112:00 am

2016-3-11:00 am

2016-3-12:00 am

2016-3-1111:00pm

2016-3-1212:00am

2016-3-121:00am

2016-3-1110:00pm

2016-3-122:00am

2016-3-123:00am…

partition

partition

Stream (low latency)

Batch(bounded stream)Stream (high latency)

Page 7: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Stream Processing Decouples

7

Database(State)

App a

App b

App c

App a

App b

App c

Applications build their own stateState managed centralized

Page 8: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Time Travel

8

Process a period ofhistoric data

partition

partition

Process latest datawith low latency(tail of the log)

Reprocess stream(historic data first, catches up with realtime data)

Page 9: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

9

Latency

Volume/Throughput

State &Accuracy

Page 10: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

10

Latency

Volume/Throughput

State &Accuracy

Exactly-once semanticsEvent time processing

10s of millions evts/secfor stateful applications

Latency down tothe milliseconds

Apache Flink was the first open-source system to eliminate these

tradeoffs

Page 11: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Streaming Architecture Blueprint

11

collect log analyze serve & store

Page 12: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Flink's Approach

12

Stateful Steam Processing

Fluent API, Windows, Event Time

Table API

Stream SQL

Core API

Declarative DSL

High-level Language

Building Block

Page 13: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Stateful Steam Processing

13

Source Filter /Transform

Stateread/write Sink

Page 14: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Stateful Steam Processing

14

Scalable embedded state Access at memory speed &scales with parallel operators

Page 15: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Stateful Steam Processing

15

Re-load state

Reset positionsin input streams

Rolling back computationRe-processing

Page 16: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Stateful Steam Processing

16

Restore to differentprograms

Bugfixes, Upgrades, A/B testing, etc

Page 17: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Versioning the state of applications

17

Savepoint

Savepoint

Savepoint

App. A

App. B

App. C

Time

Savepoint

Page 18: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Flink's Approach

18

Stateful Steam Processing

Fluent API, Windows, Event Time

Table API

Stream SQL

Core API

Declarative DSL

High-level Language

Building Block

Page 19: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Event Time / Out-of-Order

19

1977 1980 1983 1999 2002 2005 2015

Processing Time

EpisodeIV

EpisodeV

EpisodeVI

EpisodeI

EpisodeII

EpisodeIII

EpisodeVII

Event Time

Page 20: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

(Stream) SQL & Table API

20

Table API

// convert stream into Tableval sensorTable: Table = sensorData .toTable(tableEnv, 'location, 'time, 'tempF)

// define query on Tableval avgTempCTable: Table = sensorTable .groupBy('location) .window(Tumble over 1.days on 'rowtime as 'w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")

SQL

sensorTable.sql(""" SELECT day, location, avg((tempF - 32) * 0.556) AS avgTempC

FROM sensorData WHERE location LIKE 'room%'GROUP BY day, location

""")

Page 21: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

What can you do with that?

21

10 billion events (2TB) processed daily across multiple Flink jobs for the telco network control center.

Ad-hoc realtime queries, > 30 operators, processing 30 billion events daily, maintaining state of 100s of GB inside Flink with exactly-once guarantees

Jobs with > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second

Page 22: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Flink's Streams playing at Batch

22

TeraSort

Relational Join

Classic Batch Jobs

GraphProcessing

LinearAlgebra

Page 23: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

23

What can we expect next ?

Page 24: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Queryable State

24

Page 25: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Streaming Architecture Blueprint

25

collect log analyze &serve & store

Other Services

Page 26: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Full SQL on Streams

26

Continuous queriesincremental results

Windows, event time,processing time

Consistent with SQL on bounded data https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_f4konQPW4tnl8THw6rzGUdaqU

Page 27: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Elastic Parallelism

27

Maintaining exactly-oncestate consistency

No extra effort for the userNo need to carefully planpartitions

Page 28: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

Very large state

28

Terabytes of state inside the stream processor

Maintaining fast checkpoints and recoveryE.g., long histories of windows, large join tablesState at local memory speed

Page 29: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

29

Page 30: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It

We are hiring!

data-artisans.com/careers