apache flink training: system overview
TRANSCRIPT
Apache Flink® TrainingSystem Overview
2
A stream processor with many applications
Streaming dataflow runtime
Apache Flink
1 year of Flink - codeApril 2014 April 2015
Flink Community
Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13Nov-13 Jun-14 Dec-14 Jul-150
20
40
60
80
100
120 #unique contributor ids by git commits
In top 5 of Apache's big data projects after one year in the Apache Software Foundation
The Apache Way Flink is an Apache top-level
project
5
• Independent, non-profit organization• Community-driven open source software
development approach• Public communication and open to new
contributors
What is Apache Flink?
6
Gelly
Tabl
e
ML
SAM
OA
DataSet (Java/Scala/Python) DataStream (Java/Scala)
Hado
op M
/R
Local Remote Yarn Tez Embedded
Data
flow
Data
flow
(WiP
)
MRQ
L
Tabl
e
Casc
adin
g (W
iP)
Streaming dataflow runtime
Native workload support
7
Flink
Streaming topologies
Heavy batch jobs Machine Learning at
scale
How can an engine natively support all these workloads?And what does native mean?
E.g.: Non-native iterations
8
Step Step Step Step Step
Client
for (int i = 0; i < maxIterations; i++) {// Execute MapReduce job
}
E.g.: Non-native streaming
9
streamdiscretizer
Job Job Job Jobwhile (true) { // get next few records // issue batch job}
Native workload support
10
Flink
Stream processing
Batchprocessing Machine Learning at
scale
How can an engine natively support all these workloads?And what does "native" mean?
Graph Analysis
Flink Engine
1. Execute everything as streams
2. Iterative (cyclic) dataflows
3. Mutable state
4. Operate on managed memory
5. Special code paths for batch11
State + Computation
What is a Flink Program?
12
Gelly
Tabl
e
ML
SAM
OA
DataSet (Java/Scala/Python) DataStream (Java/Scala)
Hado
op M
/R
Local Remote Yarn Tez Embedded
Data
flow
Data
flow
(WiP
)
MRQ
L
Tabl
e
Casc
adin
g (W
iP)
Streaming dataflow runtime
Flink stack
Basic API Concept
How do I write a Flink program?1. Bootstrap sources2. Apply operations3. Output to source
14
Data Stream Operation Data
StreamSource Sink
Data Set Operation Data
SetSource Sink
Batch & Stream Processing
DataStream API
15
Stock FeedName Price
Microsoft 124
Google 516
Apple 235
… …
Alert if Microsoft
> 120
Write event to database
Sum every 10
seconds
Alert if sum > 10000
Microsoft 124
Google 516Apple 235
Microsoft 124
Google 516
Apple 235
b h
2 1
3 5
7 4
… …
Map Reduce
a
12
…
DataSet APIExample: Map/Reduce paradigm
Example: Live Stock Feed
Streaming & Batch
16
Batch
finite
blocking or pipelined
high
Streaming
infinite
pipelined
low
Input
Data transfer
Latency
Scaling out
17
Data Set Operation Data
SetSource Sink
Data Set Operation Data
SetSource SinkData Set Operation Data
SetSource SinkData Set Operation Data
SetSource SinkData Set Operation Data
SetSource SinkData Set Operation Data
SetSource SinkData Set Operation Data
SetSource SinkData Set Operation Data
SetSource Sink
18
Scaling up
Sources (selection)Collection-based fromCollection fromElementsFile-based TextInputFormat CsvInputFormatOther SocketInputFormat KafkaInputFormat Databases
19
Sinks (selection)File-based TextOutputFormat CsvOutputFormat PrintOutputOthers SocketOutputFormat KafkaOutputFormat Databases
20
Hadoop IntegrationOut of the box Access HDFS Yarn Execution (covered later) Reuse data types (Writables)
With a thin wrapper Reuse Hadoop input and output formats Reuse functions like Map and Reduce
21
What’s the Lifecycle of a Program?
22
Architecture Overview Client Master (Job Manager) Worker (Task Manager)
24
Client
Job Manager
Task Manager
Task Manager
Task Manager
Client Optimize Construct job graph Pass job graph to job manager Retrieve job results
25
Job Manager
Client
case class Path (from: Long, to: Long)val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }
Optimizer
Type extraction
Data Sourceorders.tbl
Filter
Map DataSource
lineitem.tbl
JoinHybrid Hash
buildHT
probe
hash-part [0] hash-part [0]
GroupRedsort
forward
Job Manager Parallelization: Create Execution Graph Scheduling: Assign tasks to task managers State: Supervise the execution
26
Job Manager
Data Sourceorders.tbl
Filter
Map DataSource
lineitem.tbl
JoinHybrid Hash
buildHT
probe
hash-part [0]
hash-part [0]
GroupRed
sort
forward
Task Manager
Task Manager
Task Manager
Task Manager
Data Sourceorders.tbl
Filter
Map DataSource
lineitem.tbl
JoinHybrid Hash
buildHT
probe
hash-part [0]
hash-part [0]
GroupRed
sort
forward
Data Sourceorders.tbl
Filter
Map DataSource
lineitem.tbl
JoinHybrid Hash
buildHT
probe
hash-part [0]
hash-part [0]
GroupRed
sort
forward
Data Sourceorders.tbl
FilterMap DataSou
rcelineitem.tbl
JoinHybrid Hash
buildHT
probe
hash-part [0]
hash-part [0]
GroupRed
sort
forward
Data Sourceorders.tbl
Filter
Map DataSource
lineitem.tbl
JoinHybrid Hash
buildHT
probe
hash-part [0]
hash-part [0]
GroupRed
sort
forward
Task Manager Operations are split up into tasks
depending on the specified parallelism Each parallel instance of an operation runs
in a separate task slot The scheduler may run several tasks from
different operators in one task slot
27
Task Manager
Slot
Task ManagerTask Manager
Slot
Slot
Execution Setups
28
Ways to Run a Flink Program
29
Gelly
Tabl
e
ML
SAM
OA
DataSet (Java/Scala/Python) DataStream (Java/Scala)
Hado
op M
/R
Local Remote Yarn Tez Embedded
Data
flow
Data
flow
(WiP
)
MRQ
L
Tabl
e
Casc
adin
g (W
iP)
Streaming dataflow runtime
Local Execution Starts local Flink
cluster All processes run in
the same JVM Behaves just like a
regular Cluster Very useful for
developing and debugging
30
Job Manager
Task Manager
Task Manager
Task Manager
Task Manager
JVM
Embedded Execution Runs operators on simple Java
collections Lower overhead Does not use memory management Useful for testing and debugging
31
Remote Execution
Submit a Job remotely
Monitor the status of a job
32
Client Job Manager
Cluster
Task Manager
Task Manager
Task Manager
Task Manager
Submit job
YARN Execution Multi-user
scenario Resource sharing Uses YARN
containers to run a Flink cluster
Easy to setup
33
Client
Node Manager
Job Manager
YARN Cluster
Resource Manager
Node Manager
Task Manager
Node Manager
Task Manager
Node Manager
Other Application
Execution Leverages Apache Tez’s runtime Built on top of YARN Good YARN citizen Fast path to elastic deployments Slower than native Flink
34
Flink compared to other projects
35
Batch & Streaming projectsBatch only
Streaming only
Hybrid
36
Batch comparison
37
API low-level high-level high-level
Data Transfer batch batch pipelined & batch
Memory Management disk-based JVM-managed Active managed
Iterations file systemcached
in-memory cached streamed
Fault tolerance task level task level job level
Good at massive scale out data exploration heavy backend & iterative jobs
Libraries many external built-in & external evolving built-in & external
Streaming comparison
38
Streaming “true” mini batches “true”
API low-level high-level high-level
Fault tolerance tuple-level ACKs RDD-based (lineage) coarse checkpointing
State not built-in external internal
Exactly once at least once exactly once exactly once
Windowing not built-in restricted flexible
Latency low medium low
Throughput medium high high
Thank you for listening!
39