storm - university of california, berkeleycloud.berkeley.edu/data/storm-berkeley.pdfnathan marz...
TRANSCRIPT
![Page 1: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/1.jpg)
Nathan MarzTwitter
Distributed and fault-tolerant realtime computation
Storm
![Page 2: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/2.jpg)
Basic info
• Open sourced September 19th
• Implementation is 15,000 lines of code
• Used by over 25 companies
• >2400 watchers on Github (most watched JVM project)
• Very active mailing list
• >1800 messages
• >560 members
![Page 3: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/3.jpg)
Before Storm
Queues Workers
![Page 4: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/4.jpg)
Example
(simplified)
![Page 5: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/5.jpg)
Example
Workers schemify tweets and append to Hadoop
![Page 6: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/6.jpg)
Example
Workers update statistics on URLs by incrementing counters in Cassandra
![Page 7: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/7.jpg)
ScalingDeploy
Reconfigure/redeploy
![Page 8: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/8.jpg)
Problems
• Scaling is painful
• Poor fault-tolerance
• Coding is tedious
![Page 9: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/9.jpg)
What we want
• Guaranteed data processing
• Horizontal scalability
• Fault-tolerance
• No intermediate message brokers!
• Higher level abstraction than message passing
• “Just works”
![Page 10: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/10.jpg)
Storm
Guaranteed data processing
Horizontal scalability
Fault-tolerance
No intermediate message brokers!
Higher level abstraction than message passing
“Just works”
![Page 11: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/11.jpg)
Streamprocessing
Continuouscomputation
DistributedRPC
Use cases
![Page 12: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/12.jpg)
Storm Cluster
![Page 13: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/13.jpg)
Storm Cluster
Master node (similar to Hadoop JobTracker)
![Page 14: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/14.jpg)
Storm Cluster
Used for cluster coordination
![Page 15: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/15.jpg)
Storm Cluster
Run worker processes
![Page 16: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/16.jpg)
Starting a topology
![Page 17: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/17.jpg)
Killing a topology
![Page 18: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/18.jpg)
Concepts
• Streams
• Spouts
• Bolts
• Topologies
![Page 19: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/19.jpg)
Streams
Unbounded sequence of tuples
Tuple Tuple Tuple Tuple Tuple Tuple Tuple
![Page 20: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/20.jpg)
Spouts
Source of streams
![Page 21: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/21.jpg)
Spout examples
• Read from Kestrel queue
• Read from Twitter streaming API
![Page 22: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/22.jpg)
Bolts
Processes input streams and produces new streams
![Page 23: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/23.jpg)
Bolts
• Functions
• Filters
• Aggregation
• Joins
• Talk to databases
![Page 24: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/24.jpg)
Topology
Network of spouts and bolts
![Page 25: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/25.jpg)
Tasks
Spouts and bolts execute as many tasks across the cluster
![Page 26: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/26.jpg)
Task execution
Tasks are spread across the cluster
![Page 27: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/27.jpg)
Task execution
Tasks are spread across the cluster
![Page 28: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/28.jpg)
Stream grouping
When a tuple is emitted, which task does it go to?
![Page 29: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/29.jpg)
Stream grouping
• Shuffle grouping: pick a random task
• Fields grouping: mod hashing on a subset of tuple fields
• All grouping: send to all tasks
• Global grouping: pick task with lowest id
![Page 30: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/30.jpg)
Topology
shuffle
[“url”]
shuffle
shuffle
[“id1”, “id2”]
all
![Page 31: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/31.jpg)
Streaming word count
TopologyBuilder is used to construct topologies in Java
![Page 32: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/32.jpg)
Streaming word count
Define a spout in the topology with parallelism of 5 tasks
![Page 33: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/33.jpg)
Streaming word count
Split sentences into words with parallelism of 8 tasks
![Page 34: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/34.jpg)
Consumer decides what data it receives and how it gets grouped
Streaming word count
Split sentences into words with parallelism of 8 tasks
![Page 35: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/35.jpg)
Streaming word count
Create a word count stream
![Page 36: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/36.jpg)
Streaming word count
splitsentence.py
![Page 37: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/37.jpg)
Streaming word count
![Page 38: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/38.jpg)
Streaming word count
Submitting topology to a cluster
![Page 39: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/39.jpg)
Streaming word count
Running topology in local mode
![Page 40: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/40.jpg)
Demo
![Page 41: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/41.jpg)
Distributed RPC
Data flow for Distributed RPC
![Page 42: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/42.jpg)
DRPC Example
Computing “reach” of a URL on the fly
![Page 43: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/43.jpg)
Reach
Reach is the number of unique peopleexposed to a URL on Twitter
![Page 44: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/44.jpg)
Computing reach
URL
Tweeter
Tweeter
Tweeter
Follower
Follower
Follower
Follower
Follower
Follower
Distinct follower
Distinct follower
Distinct follower
Count Reach
![Page 45: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/45.jpg)
Reach topology
![Page 46: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/46.jpg)
Reach topology
![Page 47: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/47.jpg)
Reach topology
![Page 48: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/48.jpg)
Reach topology
Keep set of followers foreach request id in memory
![Page 49: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/49.jpg)
Reach topology
Update followers set whenreceive a new follower
![Page 50: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/50.jpg)
Reach topology
Emit partial count afterreceiving all followers for a
request id
![Page 51: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/51.jpg)
Demo
![Page 52: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/52.jpg)
Guaranteeing message processing
“Tuple tree”
![Page 53: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/53.jpg)
Guaranteeing message processing
• A spout tuple is not fully processed until all tuples in the tree have been completed
![Page 54: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/54.jpg)
Guaranteeing message processing
• If the tuple tree is not completed within a specified timeout, the spout tuple is replayed
![Page 55: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/55.jpg)
Guaranteeing message processing
Reliability API
![Page 56: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/56.jpg)
Guaranteeing message processing
“Anchoring” creates a new edge in the tuple tree
![Page 57: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/57.jpg)
Guaranteeing message processing
Marks a single node in the tree as complete
![Page 58: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/58.jpg)
Guaranteeing message processing
• Storm tracks tuple trees for you in an extremely efficient way
![Page 59: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/59.jpg)
Transactional topologies
How do you do idempotent counting with anat least once delivery guarantee?
![Page 60: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/60.jpg)
Won’t you overcount?
Transactional topologies
![Page 61: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/61.jpg)
Transactional topologies solve this problem
Transactional topologies
![Page 62: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/62.jpg)
Built completely on top of Storm’s primitivesof streams, spouts, and bolts
Transactional topologies
![Page 63: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/63.jpg)
Batch 1 Batch 2 Batch 3
Transactional topologies
Process small batches of tuples
![Page 64: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/64.jpg)
Batch 1 Batch 2 Batch 3
Transactional topologies
If a batch fails, replay the whole batch
![Page 65: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/65.jpg)
Batch 1 Batch 2 Batch 3
Transactional topologies
Once a batch is completed, commit the batch
![Page 66: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/66.jpg)
Batch 1 Batch 2 Batch 3
Transactional topologies
Bolts can optionally be “committers”
![Page 67: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/67.jpg)
Commit 1
Transactional topologies
Commits are ordered. If there’s a failure during commit, the whole batch + commit is retried
Commit 1 Commit 2 Commit 3 Commit 4 Commit 4
![Page 68: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/68.jpg)
Example
![Page 69: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/69.jpg)
Example
New instance of this objectfor every transaction attempt
![Page 70: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/70.jpg)
Example
Aggregate the count forthis batch
![Page 71: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/71.jpg)
Example
Only update database if transaction ids differ
![Page 72: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/72.jpg)
Example
This enables idempotency sincecommits are ordered
![Page 73: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/73.jpg)
Example
(Credit goes to Kafka devs for this trick)
![Page 74: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/74.jpg)
Transactional topologies
Multiple batches can be processed in parallel, but commits are guaranteed to be ordered
![Page 75: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/75.jpg)
• Will be available in next version of Storm (0.7.0)
• Requires a source queue that can replay identical batches of messages
• storm-kafka has a transactional spout implementation for Kafka
Transactional topologies
![Page 76: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/76.jpg)
Storm UI
![Page 77: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/77.jpg)
Storm on EC2
https://github.com/nathanmarz/storm-deploy
One-click deploy tool
![Page 78: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/78.jpg)
Starter code
https://github.com/nathanmarz/storm-starter
Example topologies
![Page 79: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/79.jpg)
Documentation
![Page 80: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/80.jpg)
Ecosystem
• Scala, JRuby, and Clojure DSL’s
• Kestrel, AMQP, JMS, and other spout adapters
• Serializers
• Multilang adapters
• Cassandra, MongoDB integration
![Page 81: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/81.jpg)
Questions?
http://github.com/nathanmarz/storm
![Page 82: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/82.jpg)
Future work
• State spout
• Storm on Mesos
• “Swapping”
• Auto-scaling
• Higher level abstractions
![Page 83: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/83.jpg)
Implementation
KafkaTransactionalSpout
![Page 84: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/84.jpg)
Implementation
all
all
all
![Page 85: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/85.jpg)
Implementation
all
all
all
TransactionalSpout is a subtopology consisting of a spout and a bolt
![Page 86: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/86.jpg)
Implementation
all
all
all
The spout consists of one task that coordinates the transactions
![Page 87: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/87.jpg)
Implementation
all
all
all
The bolt emits the batches of tuples
![Page 88: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/88.jpg)
Implementation
all
all
all
The coordinator emits a “batch” streamand a “commit stream”
![Page 89: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/89.jpg)
Implementation
all
all
all
Batch stream
![Page 90: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/90.jpg)
Implementation
all
all
all
Commit stream
![Page 91: Storm - University of California, Berkeleycloud.berkeley.edu/data/storm-berkeley.pdfNathan Marz Twitter Distributed and fault-tolerant realtime computation Storm](https://reader030.vdocuments.site/reader030/viewer/2022040403/5e86aaf258f7f502e2250232/html5/thumbnails/91.jpg)
Implementation
all
all
all
Coordinator reuses tuple tree framework to detectsuccess or failure of batches or commits and replays
appropriately