introduction to stream processing with apache flink® · apache flink 1.0.3 . stateless stream...

32
Kostas Kloudas Vasia Kalavri Jonas Traub Introduction to Stream Processing with Apache Flink®

Upload: others

Post on 26-Aug-2020

39 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Kostas Kloudas Vasia Kalavri Jonas Traub

Introduction to Stream Processing with Apache Flink®

Page 2: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Who are we?

Kostas: software engineer @ data Artisans Vasia: PhD student @ KTH Stockholm

Jonas: research associate @ TU Berlin

2

Page 3: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Overview

What is Stream Processing? What is Apache Flink? Windowed computations over streams Handling time Handling node failures Handling planned downtime Handling code upgrades

3

Page 4: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Demo instructions…

4

Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/#more-1181

Make sure you download: Apache Flink 1.0.3

Page 5: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Stateless stream processing

5

Page 6: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Stateful stream processing

6

Page 7: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Why should you care?

7

Data production is and has always been a continuous process.

Stream processing enables the obvious: Continuous processing on data that is

continuously produced

Page 8: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

What is Apache Flink?

8

Page 9: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

A data processing engine

9

Apache Flink is an open source platform for distributed stream and batch processing

Page 10: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

The Apache Flink Ecosystem

10

SQL

SQL

Page 11: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

What does Flink provide?

High Throughput and Low Latency • Yahoo! Benchmark : https://yahooeng.tumblr.com/post/135321837876/benchmarking-

streaming-computation-engines-at • Extended by Data Artisans: http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

11

Page 12: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

What does Flink provide?

High Throughput and Low Latency Event-time (out-of-order) processing Exactly-once semantics Flexible windowing Fault-Tolerance

12

Page 13: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Time for demo…

13

Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/#more-1181

Page 14: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Setup:

14

Sensor Data

Page 15: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Windowed computations

15

Page 16: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Handling time

16

Page 17: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Handling time

17

The system has to respect the same clock as the data.

Page 18: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

1977 1980 1983 1999 2002 2005 2015

Processing Time

Episode IV

Episode V

Episode VI

Episode I

Episode II

Episode III

Episode VII

Event Time

18

Event Time vs Processing Time

Page 19: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Handling time: Watermarks

19

Special events generated by the sources. A watermark for time T states that event

time has progressed to T in that particular stream (or partition). No events with a timestamp smaller than T

can arrive any more.

Page 20: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Handling time: Watermarks

20

Sources emit elements and watermarks….

…operators always emit the lowest watermark

Page 21: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Handling time: Watermarks

21

Page 22: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Handling node failures

22

Page 23: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Checkpoints

23

Sources emit elements and checkpoints….

Page 24: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Checkpoints

24

Page 25: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Handling planned downtime

25

Page 26: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Handling code upgrades

26

Page 27: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Is Apache Flink only that?

27

Apache Flink is an open source platform for distributed stream and batch processing

Page 28: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Its lively community

28

You can join: • Follow: @ApacheFlink, @dataArtisans • Read: flink.apache.org/blog, data-artisans.com/blog • Subscribe: (news | user | dev) @ flink.apache.org

0

50

100

150

200

250

Feb.15 Dec.15 Aug.16

Contributors

0200400600800

10001200140016001800

Feb.15 Dec.15 Aug.16

Stars on Github

0

200

400

600

800

1000

1200

Feb.15 Dec.15 Aug.16

Forks on Github Apache Flink Community Growth

Page 29: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Its Users

29

…https://flink.apache.org/poweredby.html

Page 31: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

All of them will meet at... http://flink-forward.org/

Page 32: Introduction to Stream Processing with Apache Flink® · Apache Flink 1.0.3 . Stateless stream processing 5 . Stateful stream processing 6 . Why should you care? 7 Data production

Further Reading

Event-time processing: • The Dataflow Model: http://www.vldb.org/pvldb/vol8/p1792-Akidau.pdf • http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/

Checkpointing and State: • Distributed Snapshots: Determining Global States of Distributed Systems

http://research.microsoft.com/en-us/um/people/lamport/pubs/chandy.pdf • Lightweight Asynchronous Snapshots for Distributed Dataflows

https://arxiv.org/abs/1506.08603 • Working with State in Flink: https://ci.apache.org/projects/flink/flink-docs-

master/dev/state.html

Savepoints: • https://ci.apache.org/projects/flink/flink-docs-master/setup/savepoints.html

32