stephan ewen flink committer co-founder / cto @ data artisans @stephanewen apache flink
TRANSCRIPT
Stephan EwenFlink committer
co-founder / CTO @ data Artisans
@StephanEwen
ApacheFlink
Looking back one year
2
April 16, 2014
3
Stratosphere 0.4
4
Stratosphere Optimizer
Pact API (Java)
Stratosphere Runtime
DataSet API (Scala)
Local Remote
Batch processing on a pipelining engine, with iterations …
Looking at now…
5
Flink
Historic data
Kafka, RabbitMQ, ...
HDFS, JDBC, ...
ETL, Graphs,Machine LearningRelational, …
Low latency,windowing, aggregations, ...
Event logs
Real-time data streams
What is Apache Flink?
(master)
What is Apache Flink?
7
Pyth
on
Gelly
Table
ML
SA
MO
A
Flink Optimizer
DataSet (Java/Scala)DataStream (Java/Scala)
Stream Builder Hadoo
p M
/R
Local Remote Yarn Tez Embedded
Data
flo
w
Data
flo
w
Flink Dataflow Runtime
HDFS
HBase
Kafka
RabbitMQ
Flume
HCatalog
JDBC
8
Batch / Steaming APIs
case class Word (word: String, frequency: Int)
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Count.of(1000)).every(Count.of(100)) .groupBy("word").sum("frequency") .print()
val lines: DataSet[String] = env.readTextFile(...)
lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print()
DataSet API (batch):
DataStream API (streaming):
Technology inside Flinkcase class Path (from: Long, to: Long)val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next }
Cost-based optimizer
Type extraction
stack
Task schedulin
g
Recovery metadata
Pre-flight (Client)
MasterWorkers
DataSource
orders.tbl
Filter
MapDataSour
celineitem.tbl
JoinHybrid Hash
buildHT
probe
hash-part [0] hash-part [0]
GroupRed
sort
forward
Program
DataflowGraph
Memory manager
Out-of-core algos
Batch & Streaming
State & Checkpoint
s
deployoperators
trackintermediate
results
Flink by Feature / Use Case
10
Data Streaming Analysis
11
Life of data streams
Create: create streams from event sources (machines, databases, logs, sensors, …)
Collect: collect and make streams available for consumption (e.g., Apache Kafka)
Process: process streams, possibly generating derived streams (e.g., Apache Flink)
12
Stream Analysis in Flink
13More at: http://flink.apache.org/news/2015/02/09/streaming-example.html
14
Defining windows in Flink
Trigger policy• When to trigger the computation on current window
Eviction policy• When data points should leave the window• Defines window width/size
E.g., count-based policy• evict when #elements > n• start a new window every n-th element
Built-in: Count, Time, Delta policies
Checkpointing / Recovery
Flink acknowledges batches of records• Less overhead in failure-free case• Currently tied to fault tolerant data
sources (e.g., Kafka)
Flink operators can keep state• State is checkpointed• Checkpointing and record acks go together
Exactly one semantics for state
15
Checkpointing / Recovery
16
Chandy-Lamport Algorithm for consistent asynchronous distributed snapshots
Pushes checkpoint barriersthrough the data flow
Operator checkpointstarting
Checkpoint done
Data Stream
barrier
Before barrier =part of the snapshot
After barrier =Not in snapshot
Checkpoint done
checkpoint in progress
(backup till next snapshot)
Heavy ETL Pipelines
17
18
Heavy Data Pipelines
Complex ETL programs
Apology: Graph had to be blurred foronline slides, due to confidentiality
19
Memory Management
public class WC { public String word; public int count;}
emptypage
Pool of Memory Pages
Sorting, hashing, caching
Shuffling, broadcasts
User code objects
Managed
Unm
anaged
Flink contains its own memory management stack. Memory is allocated, de-allocated, and used strictly using an internal buffer pool implementation. To do that, Flink contains its own type extraction and serialization components.
More at: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525
Smooth out-of-core performance
20More at: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html
Single-core join of 1KB Java objects beyond memory (4 GB)Blue bars are in-memory, orange bars (partially) out-of-core
21
Benefits of managed memory
More reliable and stable performance (less GC effects, easy to go to disk)
Table API
22
val customers = envreadCsvFile(…).as('id, 'mktSegment) .filter( 'mktSegment === "AUTOMOBILE" )
val orders = env.readCsvFile(…) .filter( o => dateFormat.parse(o.orderDate).before(date) ) .as('orderId, 'custId, 'orderDate, 'shipPrio)
val items = orders .join(customers).where('custId === 'id) .join(lineitems).where('orderId === 'id) .select('orderId,'orderDate,'shipPrio, 'extdPrice * (Literal(1.0f) - 'discount) as 'revenue)
val result = items .groupBy('orderId, 'orderDate, 'shipPrio) .select('orderId, 'revenue.sum, 'orderDate, 'shipPrio)
Iterations in Data Flows Machine Learning Algorithms
23
24
Iterate by looping
for/while loop in client submits one job per iteration step
Data reuse by caching in memory and/or disk
Step Step Step Step Step
Client
25
Iterate in the Dataflow
Large-Scale Machine Learning
26
Factorizing a matrix with28 billion ratings forrecommendations
(Scale of Netflixor Spotify)
More at: http://data-artisans.com/computing-recommendations-with-flink.html
State in Iterations Graphs and Machine Learning
27
28
Iterate natively with deltas
partial solution
delta set X
other datasets
Y initial
solution iteration
result
workset A B workset
Merge deltas
Replace
initial workset
Effect of delta iterations…#
of
ele
men
ts u
pd
ate
d
iteration
30
… very fast graph analysis
… and mix and matchETL-style and graph analysisin one program
Performance competitivewith dedicated graph
analysis systems
More at: http://data-artisans.com/data-analysis-with-flink.html
Closing
31
32
Flink Roadmap for 2015
Out-of-core state in Streaming
Monitoring and scaling for streaming
Streaming Machine Learning with SAMOA
More additions to the libraries• Batch Machine Learning• Graph library additions (more algorithms)
SQL on top of expression language
Master failover
Flink community
Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13Nov-13 Jun-14 Dec-14 Jul-150
20
40
60
80
100
120#unique contributor ids by git commits
flink.apache.org@ApacheFlink
35
Backup
Cornerpoints of Flink Design
36
Robust Algorithms on Managed Memory
Pipelined Executionof Batch Programs
Better shuffle performance
No OutOfMemory Errors Scales to very large JVMs Efficient an robust processing
Flexible DataStreaming Engine
Low Latency Steam Proc.
Highly flexible windows
Native Iterations Very fast Graph
Processing Stateful Iterations for ML
High-level APIs,beyond key/value pairs
Java/Scala/Python (upcoming)
Relational-style optimizer
Graphs / Machine Learning
Streaming ML (coming)
Scales to very large groups
Active Library Development
37
Program optimization
38
A simple program
val orders = … val lineitems = …
val filteredOrders = orders .filter(o => dataFormat.parse(l.shipDate).after(date)) .filter(o => o.shipPrio > 2)
val lineitemsOfOrders = filteredOrders .join(lineitems) .where(“orderId”).equalTo(“orderId”) .apply((o,l) => new SelectedItem(o.orderDate, l.extdPrice))
val priceSums = lineitemsOfOrders .groupBy(“orderDate”).sum(“l.extdPrice”);
39
Two execution plans
DataSourceorders.tbl
Filter
Map DataSourcelineitem.tbl
JoinHybrid Hash
buildHT probe
broadcast forward
Combine
GroupRed
sort
DataSourceorders.tbl
Filter
Map DataSourcelineitem.tbl
JoinHybrid Hash
buildHT probe
hash-part [0] hash-part [0]
hash-part [0,1]
GroupRed
sort
forwardBest plan depends on
relative sizes of
input files
40
Examples of optimization
Task chaining• Coalesce map/filter/etc tasks
Join optimizations• Broadcast/partition, build/probe side, hash or sort-
merge
Interesting properties• Re-use partitioning and sorting for later operations
Automatic caching• E.g., for iterations
41
Visualization
42
Visualization tools
43
Visualization tools
44
Visualization tools