aljoscha krettek - apache flink for iot: how event-time processing enables easy and accurate...

37
1 Aljoscha Krettek @aljoscha Big Data Spain November 17, 2016 Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Upload: dataartisans

Post on 16-Apr-2017

464 views

Category:

Data & Analytics


7 download

TRANSCRIPT

Page 1: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

1

Aljoscha Krettek@aljoscha

Big Data SpainNovember 17, 2016

Apache Flink for IoT:How Event-Time Processing Enables Easy and Accurate Analytics

Page 2: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

What I’d Like to Talk About

2

Streaming architecture and Flink

IoT and event-time stream processing

Use-case examples

Page 3: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

3

Original creators of Apache Flink®

Providers of the dA Platform, a supported

Flink distribution

Page 4: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Intro: The Streaming Architecture

4

Page 5: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

5

Big Data Architecture

Collect events in HDFS (or similar) Periodically run (batch) jobs to process Problems:• Huge latency• Natural boundaries in data don’t match

batch boundaries

Page 6: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

6

Rethinking Data Architecture

Real-time reaction to events

Continuous applications

Process both real-time and historical data

Page 7: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

What is (Distributed) Streaming Streaming:

Computations on never-ending “streams” of data records (“events”)

Distributed:Computation spread across many machines

7

Your code

Your code

Your code

Your code

Page 8: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

What is Stateful Streaming Result depends on

history of stream A stateful stream

processor should gives the tools to manage state• Recover, roll back, version,

upgrade, etc8

Your code

state

Page 9: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

What is Event-Time Streaming Events have timestamps

Processing depends on timestamps

An event-time stream processor should give you the tools to reason about time• Handle streams that are out of

order9

Your code

state

t3 t1 t2t4 t1-t2 t3-t4

Page 10: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

10

app state

app state

app state

event log

Queryservice

Page 11: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Recap: What is Streaming? Continuous processing of data that is

continuously generated I.e., pretty much all “big” data It’s all about state and time Flink does all of that

11

Page 12: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

12

IoT and Event-time Stream Processing

Page 13: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

13 1read.bi/1yDOQQ3

The 'Internet Of Everything' Will Generate $14.4 Trillion Of Value Over The Next Decade.1

Page 14: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Example Event Sources

14

Page 15: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

A Simple Definition

15

IoT use cases from the system’s perspective:

A large number of (distributed) things continuously generating a large amount of data.

Page 16: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

IoT: Some Insights

16

Data is continuously produced → Stream Processing

Events have a timestamp→ Event-time based processing

Data/Events can arrive with huge delays/out-of-order

Most analyses happen on time windows

Page 17: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

What Is Event-Time Processing

17

1977 1980 1983 1999 2002 2005 2015

Processing Time

EpisodeIV

EpisodeV

EpisodeVI

EpisodeI

EpisodeII

EpisodeIII

EpisodeVII

Event Time

Page 18: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

What Is Event-Time Processing

18

1312735961112

1234567891011121314Processing Time

Event timestamp

Message Queue

Page 19: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

What’s The Problem?

19

13

12

735961112

1234567891011121314Processing Time

Processing-Time Windows 137356

12 137 356Event-Time Windows

12

1112

Mismatch between event time and processing time.

Page 20: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Sources of Time Mismatch Big Mismatch• Network disconnects• Slow network

Small Mismatch• The nature of distributed systems• Differing system clock time

20

Page 21: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Small Event-Time Mismatch

21

Robust Stream Processing with Apache Flink®:A Simple Walkthroughhttp://data-artisans.com/robust-stream-processing-flink-walkthrough/

Page 22: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

22

Page 23: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

23

Page 24: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

24

Page 25: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Recap: Event-Time IoT use cases need event-time

processing Even small mismatch of event

time/processing time will lead to wrong results

25

Page 26: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

26

Use-Case Examples

Page 27: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily

Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees

27

Page 28: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

King Challenges:• Many games (Candy Crush, Farm

Heroes, Pet Rescue, and Bubble Witch…)• 300 million monthly unique users • 30 billion events received every day

Need event-time based statistics

28https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 29: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Solution: RBEA

29https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 30: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Solution: RBEA Multiplexing of multiple data scientist

requests into a single Flink job Groovy as language for analysis

scripts Event-time windowing

30https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 31: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Bouygues Telecom

31http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

~120users*

5 FlinkProductionApps

750 TBStorage

4 billionEvents/day

2015

~300users*

30 FlinkProductionApps

2 PBStorage

10 billionEvents/day

2016* Users of the information system

Page 32: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Bouygues: Challenges Low latency & streaming fashion counters Massive amounts of data + bursty loads Reliability Multiple flow correlation Time management: • Out of order & late events → our worst enemies

32http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

Page 33: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

33http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

Page 34: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

In Summary

34

If you need to ask: you already have a streaming use case!

IoT requires Proper Time Management

Apache Flink has done that for a long time now*

* Since version 0.10

Page 35: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

35

Thank you!

@aljoscha@ApacheFlink @dataArtisans

Page 36: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

36

One day of hands-on Flink training

One day of conference

Tickets are on sale

Call for Papers is already open

Please visit our website:http://sf.flink-forward.org

Follow us on Twitter: @FlinkForward

Page 37: Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

We are hiring!

data-artisans.com/careers