tlv data plumbers: exactly once processing
TRANSCRIPT
Exactly Once ProcessingThe Sad Truth
Yair Weinbergeralooma | CTO@yairwein
#TLVDataPlumbers
● Create a community of data plumbers
● "The best minds of my generation are deleting commas from log files, and that makes me sad." @medriscoll, http://adage.com/...
● “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights” @mrogati, http://www.nytimes.com/2014/08/18/...
● “It’s time to take care of our clogged data plumbing”,
Tom Davenport, http://venturebeat.com/2014/09/18/...
A real-time platform that abstracts the data layer
Mobile
Servers
Sensors
Devices
tons of plumbing to make it work
Unscalable Rigid Leaky Slow
Analytics Personalization Monitoring And more…
A real-time platform that abstracts the data layer
Scalable Flexible Reliable Fast
Servers
Sensors
Devices
Mobile
Analytics Personalization Monitoring And more…
Common myths
● We use trident, so we guarantee “exactly once”
● If a tuple in a transaction failed, the whole transaction will be repeated, and the computation done on the transaction so far will be discarded
● Someone already put up a working transactional state
Even if the backing store is transactional!
Theory:- begin commit => begin transaction in the backing store- update state => write to the backing store- commit => commit in the backing store
Practice:- This only works for single-threaded states!- commit is called once per thread
Experience:- Even in single threaded state, the thread can crash
between commit to the backing store and ack
Fake it ‘till you make it
Idempotency to the rescueSimple (non distributed) example: TCP sequence numbers
Fake it ‘till you make it
Idempotency to the rescueSimple (non distributed) example: TCP sequence numbers
Fake it ‘till you make it
Idempotency to the rescueSimple (non distributed) example: TCP sequence numbers
Fake it ‘till you make it
Idempotency to the rescueSimple (non distributed) example: TCP sequence numbers
Fake it ‘till you make it
Idempotent distributed examples
- Database with primary key (INSERT … IGNORE)
- Kafka with log compaction
Kafka Log Compaction
● Stores only the most recent message per key, thus idempotent.
● We can achieve exactly once even without transactional state!
https://storm.apache.org/documentation/Trident-statehttps://cwiki.apache.org/confluence/display/KAFKA/Log+Compactionhttp://bravenewgeek.com/you-cannot-have-exactly-once-delivery/Storm Real-Time Processing Cookbook, Quinton Andersonhttps://github.com/quintona/trident-kafka-push/blob/master/src/main/java/com/github/quintona/KafkaState.java
Resources