flink 1.0-slides

24
What’s new in Apache Flink TM 1.0 Kostas Tzoumas @kostas_tzoumas

Upload: jamie-grier

Post on 16-Apr-2017

167 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Flink 1.0-slides

What’s new in Apache FlinkTM 1.0

Kostas Tzoumas@kostas_tzoumas

Page 2: Flink 1.0-slides

Flink 1.0• March 8, 2016

• First release in 1.x.y series

• Initiates backwards compatibility for selected APIs

• More than 64 contributors

• More than 450 JIRAs resolved

Page 3: Flink 1.0-slides

Flink 1.0: major features

• Out of core state

• Savepoints

• CEP library

• Improved monitoring & Kafka 0.9 support

Page 4: Flink 1.0-slides

Out of core state

Page 5: Flink 1.0-slides

Out of core state• Alternative to in-memory state

• Powered by RocksDB instances in Flink TMs

• Enabled by using the RocksDBStateBackend

• State limited by disk space only

• State checkpoints save RocksDB databases in reliable store

Page 6: Flink 1.0-slides

Savepoints

Page 7: Flink 1.0-slides

Production deployments

• Maintaining stateful applications in production settings comes with its own challenges

• Failures, code upgrades, cluster maintenance, …

• Streaming jobs cannot be simply stopped and restarted

Page 8: Flink 1.0-slides

Reminder: fault tolerance

• At least once, at most once, exactly once

• Flink guarantees exactly-once processing

• Flink guarantees end to end exactly-once with selected sources and sinks

• e.g., Kafka —> Flink —> HDFS

Page 9: Flink 1.0-slides

How? Checkpoints• Flink guarantees fault tolerance by regularly taking

checkpoints of the application state without ever stopping the execution

• At failure, input stream is rewinded to the logical time of the last checkpoint

Page 10: Flink 1.0-slides

Introducing savepoints

• A savepoint is a Flink checkpoint that (1) is taken by the user, (2) is accessible externally, and (3) never expires

• Command line save & resume interface

• Save: flink savepoint <JobID>

• Resume: flink run -s <path/to/savepoint> <jobJar>

Page 11: Flink 1.0-slides

Savepoints and versions

• A savepoint saves a version of a stateful application at a well-defined time

• E.g.: take snapshots of one application at well-defined times

Page 12: Flink 1.0-slides

“Like git for state” • Branch off from savepoints creating a tree of

running application versions

Page 13: Flink 1.0-slides

Essential for production deployments

• Application code upgrades

• Flink version upgrades

• Maintenance, migration, debugging

• What-if simulations

• A/B testing

• Time travel

Page 14: Flink 1.0-slides

Complex Event Processing

Page 15: Flink 1.0-slides

FlinkCEP

• What is Complex Event Processing?

• A catch-all term

• In our context: easily detect patterns in streams

Page 16: Flink 1.0-slides
Page 17: Flink 1.0-slides

Pattern API

Page 18: Flink 1.0-slides
Page 19: Flink 1.0-slides
Page 20: Flink 1.0-slides

Other features in 1.0• Support for Kafka 0.9 API (and hence MapR

Streams)

• Monitoring console: job submission, checkpoint statistics, detecting bottlenecks

• See http://flink.apache.org/news/2016/03/08/release-1.0.0.html

Page 21: Flink 1.0-slides

Closing

Page 22: Flink 1.0-slides

Summary

• Flink 1.0: Initiating backwards compatibility and pushing the envelope even further for production streaming deployments

Page 23: Flink 1.0-slides

What’s next• SQL

• Dynamic scaling (+ savepoints)

• Hybrid in-memory/out-of-core state backend

• Query-able state

• Support for Apache Mesos

• More connectors and sinks (Kinesis, Cassandra, …)

Page 24: Flink 1.0-slides

Join the community• Follow: @ApacheFlink, @dataArtisans

• Read: flink.apache.org/blog, data-artisans.com/blog

• Subscribe: (news | dev | user)@flink.apache.org