tlv data plumbers: exactly once processing

27
Exactly Once Processing The Sad Truth Yair Weinberger alooma | CTO @yairwein

Upload: alooma

Post on 16-Aug-2015

72 views

Category:

Technology


1 download

TRANSCRIPT

Exactly Once ProcessingThe Sad Truth

Yair Weinbergeralooma | CTO@yairwein

#TLVDataPlumbers

● Create a community of data plumbers

● "The best minds of my generation are deleting commas from log files, and that makes me sad." @medriscoll, http://adage.com/...

● “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights” @mrogati, http://www.nytimes.com/2014/08/18/...

● “It’s time to take care of our clogged data plumbing”,

Tom Davenport, http://venturebeat.com/2014/09/18/...

A real-time platform that abstracts the data layer

Mobile

Servers

Sensors

Devices

tons of plumbing to make it work

Unscalable Rigid Leaky Slow

Analytics Personalization Monitoring And more…

A real-time platform that abstracts the data layer

Scalable Flexible Reliable Fast

Servers

Sensors

Devices

Mobile

Analytics Personalization Monitoring And more…

Exactly Once Semantics

Same goes for exactly-once semantics

Maybe exists Does not exist

storm + trident

● What is storm?

● What is trident? (what is transaction)

Exactly once processing - storm + trident

Common myths

● We use trident, so we guarantee “exactly once”

● If a tuple in a transaction failed, the whole transaction will be repeated, and the computation done on the transaction so far will be discarded

● Someone already put up a working transactional state

Transactional state is hard (impossible?)

● Trident States

Even if the backing store is transactional!

Theory:- begin commit => begin transaction in the backing store- update state => write to the backing store- commit => commit in the backing store

Practice:- This only works for single-threaded states!- commit is called once per thread

Experience:- Even in single threaded state, the thread can crash

between commit to the backing store and ack

Fake it ‘till you make it

Idempotency to the rescue

Fake it ‘till you make it

Idempotency to the rescueSimple (non distributed) example: TCP sequence numbers

Fake it ‘till you make it

Idempotency to the rescueSimple (non distributed) example: TCP sequence numbers

Fake it ‘till you make it

Idempotency to the rescueSimple (non distributed) example: TCP sequence numbers

Fake it ‘till you make it

Idempotency to the rescueSimple (non distributed) example: TCP sequence numbers

Fake it ‘till you make it

Idempotent distributed examples

- Database with primary key (INSERT … IGNORE)

- Kafka with log compaction

Kafka Log Compaction

● Stores only the most recent message per key, thus idempotent.

● We can achieve exactly once even without transactional state!

Exactly once - with idempotence

Maybe exists Does not exist

Questions?