the flink big data analytics platform
TRANSCRIPT
The Flink Big Data
Analytics Platform
Marton Balassi, Gyula Fora"
{mbalassi, gyfora}@apache.org
What is Apache Flink?
• Started in 2009 by the Berlin-based database research groups
• In the Apache Incubator since mid 2014
• 61 contributors as of the 0.7.0 release
Open Source
• Fast, general purpose distributed data processing system
• Combining batch and stream processing
• Up to 100x faster than Hadoop
Fast +
Reliable
• Programming APIs for Java and Scala
• Tested on large clusters
Ready to use
11/18/2014 2 Flink - M. Balassi & Gy. Fora
What is Apache Flink?
Master
Worker
Worker
Flink Cluster
Analytical Program
Flink Client &
Op0mizer
11/18/2014 3 Flink - M. Balassi & Gy. Fora
This Talk
• Introduction to Flink
• API overview
• Distinguishing Flink
• Flink from a user perspective
• Performance
• Flink roadmap and closing
11/18/2014 4 Flink - M. Balassi & Gy. Fora
Open source Big Data Landscape
MapReduce
Hive
Flink
Spark Storm
Yarn Mesos
HDFS
Mahout
Cascading
Tez
Pig
Data processing
engines
App and resource
management
Applica3ons
Storage, streams KaCa HBase
Crunch
…
11/18/2014 5 Flink - M. Balassi & Gy. Fora
Flink stack
Common API
Storage
Streams
Hybrid Batch/Streaming Run0me
HDFS Files S3
Cluster
Manager YARN EC2 Na0ve
Flink Op0mizer
Scala API (batch)
Graph API („Spargel“)
JDBC Redis Rabbit
MQ KaCa Azure …
Java
Collec0ons
Streams Builder
Apache
Tez
Python
API
Java API (streaming)
Apache
MRQL
Batch
Streaming
Java API (batch)
Local
Execu0on
11/18/2014 6 Flink - M. Balassi & Gy. Fora
Flink APIs
Programming model
DataSet/Stream A
A (1)
A (2)
B (1)
B (2)
C (1)
C (2)
X
X
Y
Y
Program
Parallel Execution
X Y
Operator X Operator Y
Data abstractions: Data Set, Data Stream
DataSet/Stream B
DataSet/Stream C
11/18/2014 8 Flink - M. Balassi & Gy. Fora
Flexible pipelines
Reduce
Join
Map
Reduce
Map
Iterate
Source
Sink
Source
Map, FlatMap, MapPartition, Filter, Project, Reduce,
ReduceGroup, Aggregate, Distinct, Join, CoGoup, Cross,
Iterate, Iterate Delta, Iterate-Vertex-Centric, Windowing
11/18/2014 9 Flink - M. Balassi & Gy. Fora
WordCount, Java API
DataSet<String> text = env.readTextFile(input); DataSet<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("\\W")) { out.collect(new Tuple2<>(token, 1)); }) .groupBy(0) .sum(1);
11/18/2014 10 Flink - M. Balassi & Gy. Fora
WordCount, Scala API
val input = env.readTextFile(input); val words = input flatMap { line => line.split("\\W+") } val counts = words groupBy { word => word } count()
11/18/2014 11 Flink - M. Balassi & Gy. Fora
WordCount, Streaming API
DataStream<String> text = env.readTextFile(input); DataStream<Tuple2<String, Integer>> result = text .flatMap((str, out) -> { for (String token : value.split("\\W")) { out.collect(new Tuple2<>(token, 1)); }) .groupBy(0) .sum(1);
11/18/2014 12 Flink - M. Balassi & Gy. Fora
Is there anything beyond WordCount?
11/18/2014 13 Flink - M. Balassi & Gy. Fora
Beyond Key/Value Pairs
DataSet<Page> pages = ...; DataSet<Impression> impressions = ...;
DataSet<Impression> aggregated = impressions .groupBy("url") .sum("count");
pages.join(impressions).where("url").equalTo("url") .print() // outputs pairs of matching pages and impressions
class Impression { public String url; public long count; }
class Page { public String url; public String topic; }
// outputs pairs of pages and impressions
11/18/2014 14 Flink - M. Balassi & Gy. Fora
Preview: Logical Types
DataSet<Row> dates = env.readCsv(...).as("order_id", "date"); DataSet<Row> sessions = env.readCsv(...).as("id", "session"); DataSet<Row> joined = dates .join(session).where("order_id").equals("id"); joined.groupBy("date").reduceGroup(new SessionFilter()) class SessionFilter implements GroupReduceFunction<SessionType> { public void reduce(Iterable<SessionType> value, Collector out){ ... } }
public class SessionType { public String order_id; public Date date; public String session; }
11/18/2014 15 Flink - M. Balassi & Gy. Fora
Distinguishing Flink
Hybrid batch/streaming runtime
• Batch and stream processing in the same system
• No micro-batches, unified runtime
• Competitive performance
• Code reusable from batch processing to streaming,
making development and testing a piece-of-cake
11/18/2014 17 Flink - M. Balassi & Gy. Fora
Flink Streaming
• Most Data Set operators are also available for
Data Streams
• Temporal and streaming specific operators
– Window/mini-batch operators
– Window join, cross etc.
• Support for iterative stream processing
• Connectors for different data sources
– Kafka, Flume, RabbitMQ, Twitter etc.
11/18/2014 18 Flink - M. Balassi & Gy. Fora
Flink Streaming
//Build new model on every second of new data DataStream<Double[]> model= env .addSource(new TrainingDataSource()) .window(1000) .reduceGroup(new ModelBuilder());
//Predict new data using the most up-to-date model DataStream<Integer> prediction = env .addSource(new NewDataSource()) .connect(model) .map(new Predictor());
11/18/2014 19 Flink - M. Balassi & Gy. Fora
Lambda architecture
Source: https://www.mapr.com/developercentral/lambda-architecture
11/18/2014 20 Flink - M. Balassi & Gy. Fora
Lambda architecture in Flink
11/18/2014 21 Flink - M. Balassi & Gy. Fora
Dependability JV
M H
ea
p
Flink Managed
Heap
Network Buffers
Unmanaged
Heap
(next version unifies network buffers
and managed heap)
User Code
Hashing/Sor0ng/Caching
• Flink manages its own memory
• Caching and data processing happens in a dedicated
memory fraction
• System never breaks the
JVM heap, gracefully spills
Shuffles/Broadcasts
11/18/2014 22 Flink - M. Balassi & Gy. Fora
• serializes data every time
Highly robust, never gives up on you
• works on objects, RDDs may be stored serialized
Serialization considered slow, only when needed
• makes serialization really cheap:
partial deserialization, operates on serialized form
Efficient and robust!
Operating on "
Serialized Data
11/18/2014 23 Flink - M. Balassi & Gy. Fora
Operating on "
Serialized Data
Microbenchmark • Sorting 1GB worth of (long, double) tuples
• 67,108,864 elements
• Simple quicksort
11/18/2014 24 Flink - M. Balassi & Gy. Fora
Memory Management
public class WC { public String word; public int count; }
empty
page
Pool of Memory Pages
• Works on pages of bytes, maps objects transparently
• Full control over memory, out-of-core enabled
• Algorithms work on binary representation
• Address individual fields (not deserialize whole object)
• Move memory between operations
11/18/2014 25 Flink - M. Balassi & Gy. Fora
Flink from a user perspective
Flink programs run everywhere
Cluster (Batch)
Cluster (Streaming)
Local
Debugging
Fink Run3me or Apache Tez
As Java Collec0on
Programs
Embedded
(e.g., Web Container)
11/18/2014 27 Flink - M. Balassi & Gy. Fora
Migrate Easily
Flink out-of-the-box supports
• Hadoop data types (writables)
• Hadoop Input/Output Formats
• Hadoop functions and object model
Input Map Reduce Output
DataSet DataSet DataSet Red Join
DataSet Map DataSet
Output S
Input
11/18/2014 28 Flink - M. Balassi & Gy. Fora
• Requires no memory thresholds to configure – Flink manages its own memory
• Requires no complicated network configs – Pipelining engine requires much less memory for data exchange
• Requires no serializers to be configured – Flink handles its own type extraction and data representation
• Programs can be adjusted to data automatically – Flink’s optimizer can choose execution strategies automatically
Little tuning or configuration required
11/18/2014 29 Flink - M. Balassi & Gy. Fora
Understanding Programs
Visualizes the operations and the data
movement of programs
Analyze after execution
Screenshot from Flink’s plan visualizer
11/18/2014 30 Flink - M. Balassi & Gy. Fora
Understanding Programs
Analyze after execution (times, stragglers, …)
11/18/2014 31 Flink - M. Balassi & Gy. Fora
Iterations in other systems
Step Step Step Step Step
Client Loop outside the system
Step Step Step Step Step
Client Loop outside the system
11/18/2014 32 Flink - M. Balassi & Gy. Fora
Iterations in Flink
Streaming dataflow
with feedback
map
join
red.
join
System is iteration-aware, performs automatic optimization
11/18/2014 33 Flink - M. Balassi & Gy. Fora
Automatic Optimization for Iterative
Programs
Caching Loop-invariant Data Pushing work „out of the loop“
Maintain state as index
11/18/2014 34 Flink - M. Balassi & Gy. Fora
Performance
Distributed Grep
Filter
Term 1
HDFS Filter
Term 2
Filter
Term 3
Matches
Matches
Matches
• 1 TB of data (log files) • 24 machines with
• 32 GB Memory
• Regular HDDs
• HDFS 2.4.0
• Flink 0.7-incubating-SNAPSHOT
• Spark 1.2.0-SNAPSHOT
Flink up to 2.5x faster
11/18/2014 36 Flink - M. Balassi & Gy. Fora
Distributed Grep: Flink Execution
11/18/2014 37 Flink - M. Balassi & Gy. Fora
Distributed Grep: Spark Execution
Filter
Term 1 HDFS Matches
Spark executes the job in 3 stages:
Stage 1
Filter
Term 2 HDFS Matches Stage 2
Filter
Term 3 HDFS Matches Stage 3
11/18/2014 38 Flink - M. Balassi & Gy. Fora
Spark in-memory pinning
Filter
Term 1
HDFS Filter
Term 2
Filter
Term 3
Matches
Matches
Matches
In-
Memory
RDD
Cache in-memory
to avoid reading the
data for each filter
JavaSparkContext sc = new JavaSparkContext(conf); JavaRDD<String> file =
sc.textFile(inFile).persist(StorageLevel.MEMORY_AND_DISK()); for(int p = 0; p < patterns.length; p++) {
final String pattern = patterns[p]; JavaRDD<String> res = file.filter(new Function<String, Boolean>() { … }); }
11/18/2014 39 Flink - M. Balassi & Gy. Fora
Spark in-memory pinning
37 minutes
9 minutes
RDD is 100% in-
memory
Spark starts spilling
RDD to disk 11/18/2014 40 Flink - M. Balassi & Gy. Fora
PageRank
• Dataset:
– Twitter Follower Graph
– 41,652,230 vertices (users)
– 1,468,365,182 edges
(followings)
– 12 GB input data
11/18/2014 41 Flink - M. Balassi & Gy. Fora
PageRank results
11/18/2014 42 Flink - M. Balassi & Gy. Fora
Why is there a difference?
• Lets have a look at the iteration times:
Flink average: 48 sec., Spark average: 99 sec.
11/18/2014 43 Flink - M. Balassi & Gy. Fora
PageRank on Flink with "
Delta Iterations
• The algorithm runs 60 iterations until convergence (runtime
includes convergence check)
100 minutes
8.5 minutes
11/18/2014 44 Flink - M. Balassi & Gy. Fora
Again, explaining the difference
On average, a (delta) interation runs for 6.7 seconds Flink (Bulk): 48 sec.
Spark (Bulk): 99 sec.
11/18/2014 45 Flink - M. Balassi & Gy. Fora
Streaming performance
11/18/2014 46 Flink - M. Balassi & Gy. Fora
Flink Roadmap
• Flink has a major release every 3 months, "
with >=1 big-fixing releases in-between
• Finer-grained fault tolerance
• Logical (SQL-like) field addressing
• Python API
• Flink Streaming, Lambda architecture support
• Flink on Tez
• ML on Flink (e.g., Mahout DSL)
• Graph DSL on Flink
• … and more
11/18/2014 47 Flink - M. Balassi & Gy. Fora
11/18/2014 48 Flink - M. Balassi & Gy. Fora
Marton Balassi, Gyula Fora"
{mbalassi, gyfora}@apache.org
http://flink.incubator.apache.org
github.com/apache/incubator-flink
@ApacheFlink