keynote: stephan ewen - stream processing as a foundational paradigm and apache flink's...
TRANSCRIPT
![Page 1: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/1.jpg)
Stream Processing as aFoundational Paradigm and
Apache Flink's approach to itStephan Ewen, Apache Flink PMC, CTO @ data Artisans
![Page 2: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/2.jpg)
![Page 3: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/3.jpg)
Streaming technology is enabling the obvious: continuous processing on data that is continuously produced
Hint: you already have streaming data
4
![Page 4: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/4.jpg)
Streaming Subsumes Batch
5
2016-3-112:00 am
2016-3-11:00 am
2016-3-12:00 am
2016-3-1111:00pm
2016-3-1212:00am
2016-3-121:00am
2016-3-1110:00pm
2016-3-122:00am
2016-3-123:00am…
partition
partition
![Page 5: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/5.jpg)
Streaming Subsumes Batch
6
2016-3-112:00 am
2016-3-11:00 am
2016-3-12:00 am
2016-3-1111:00pm
2016-3-1212:00am
2016-3-121:00am
2016-3-1110:00pm
2016-3-122:00am
2016-3-123:00am…
partition
partition
Stream (low latency)
Stream (high latency)
![Page 6: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/6.jpg)
Streaming Subsumes Batch
7
2016-3-112:00 am
2016-3-11:00 am
2016-3-12:00 am
2016-3-1111:00pm
2016-3-1212:00am
2016-3-121:00am
2016-3-1110:00pm
2016-3-122:00am
2016-3-123:00am…
partition
partition
Stream (low latency)
Batch(bounded stream)Stream (high latency)
![Page 7: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/7.jpg)
Stream Processing Decouples
8
Database(State)
App a
App b
App c
App a
App b
App c
Applications build their own stateState managed centralized
![Page 8: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/8.jpg)
Time Travel
9
Process a period ofhistoric data
partition
partition
Process latest datawith low latency(tail of the log)
Reprocess stream(historic data first, catches up with realtime data)
![Page 9: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/9.jpg)
10
But why has it started so recently?
Stream Processing is taking off.(just look at this year's talks)
![Page 10: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/10.jpg)
11
Latency
Volume/Throughput
State &Accuracy
The combination is what makes
steaming powerful
Only recently available together
![Page 11: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/11.jpg)
12
Latency
Volume/Throughput
State &Accuracy
Exactly-once semanticsEvent time processing
10s of millions evts/secfor stateful applications
Latency down tothe milliseconds
Apache Flink was the first open-source system to eliminate these
tradeoffs
![Page 12: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/12.jpg)
Flink's Approach
13
Stateful Steam Processing
Fluent API, Windows, Event Time
Table API
Stream SQL
Core API
Declarative DSL
High-level Language
Building Block
![Page 13: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/13.jpg)
Stateful Steam Processing
14
Source Filter /Transform
Stateread/write Sink
![Page 14: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/14.jpg)
Stateful Steam Processing
15
Scalable embedded state Access at memory speed &scales with parallel operators
![Page 15: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/15.jpg)
Stateful Steam Processing
16
Re-load state
Reset positionsin input streams
Rolling back computationRe-processing
![Page 16: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/16.jpg)
Stateful Steam Processing
17
Restore to differentprograms
Bugfixes, Upgrades, A/B testing, etc
![Page 17: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/17.jpg)
Versioning the state of applications
18
Savepoint
Savepoint
Savepoint
App. A
App. B
App. C
Time
Savepoint
![Page 18: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/18.jpg)
Flink's Approach
19
Stateful Steam Processing
Fluent API, Windows, Event Time
Table API
Stream SQL
Core API
Declarative DSL
High-level Language
Building Block
![Page 19: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/19.jpg)
Event Time / Out-of-Order
20
1977 1980 1983 1999 2002 2005 2015
Processing Time
EpisodeIV
EpisodeV
EpisodeVI
EpisodeI
EpisodeII
EpisodeIII
EpisodeVII
Event Time
![Page 20: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/20.jpg)
(Stream) SQL & Table API
21
Table API
// convert stream into Tableval sensorTable: Table = sensorData .toTable(tableEnv, 'location, 'time, 'tempF)
// define query on Tableval avgTempCTable: Table = sensorTable .groupBy('location) .window(Tumble over 1.days on 'rowtime as 'w) .select('w.start as 'day, 'location, (('tempF.avg - 32) * 0.556) as 'avgTempC) .where('location like "room%")
SQL
sensorTable.sql(""" SELECT day, location, avg((tempF - 32) * 0.556) AS avgTempC
FROM sensorData WHERE location LIKE 'room%'GROUP BY day, location
""")
![Page 21: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/21.jpg)
What can you do with that?
22
10 billion events (2TB) processed daily across multiple Flink jobs for the telco network control center.
Ad-hoc realtime queries, > 30 operators, processing 30 billion events daily, maintaining state of 100s of GB inside Flink with exactly-once guarantees
Jobs with > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second
![Page 22: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/22.jpg)
Flink's Streams playing at Batch
23
TeraSort
Relational Join
Classic Batch Jobs
GraphProcessing
LinearAlgebra
![Page 23: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/23.jpg)
24
Streaming Technology is already awesome,but what are the next steps?
A.k.a, what can we expect in the "next gen" ?
A lot of things are "next gen" when lookingat the program, so here is my take on it…
![Page 24: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/24.jpg)
"Next Gen"
25
Queryable State
![Page 25: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/25.jpg)
"Next Gen"
26
Elastic ParallelismMaintaining exactly-once
state consistencyNo extra effort for the userNo need to carefully planpartitions
![Page 26: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/26.jpg)
"Next Gen"
27
Terabytes of state inside thestream processor
Maintaining fast checkpoints and recoveryE.g., long histories of windows, large join tablesState at local memory speed
![Page 27: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/27.jpg)
"Next Gen"
28
Full SQL on Streams
Continuous queries, incremental resultsWindows, event time, processing timeConsistent with SQL on bounded data
![Page 28: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/28.jpg)
29
Thank you!
![Page 29: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/29.jpg)
30
Appendix
![Page 30: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/30.jpg)
31
![Page 31: Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apache Flink's Approach to It](https://reader035.vdocuments.site/reader035/viewer/2022070603/586f901a1a28ab54768b7845/html5/thumbnails/31.jpg)
We are hiring!
data-artisans.com/careers