storm - philly emerging tech · 2019-12-04 · distributed and fault-tolerant realtime computation...

99
Nathan Marz Twitter Distributed and fault-tolerant realtime computation Storm

Upload: others

Post on 24-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Nathan MarzTwitter

Distributed and fault-tolerant realtime computation

Storm

Page 2: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Basic info

• Open sourced September 19th

• Implementation is 15,000 lines of code

• Used by over 25 companies

• >2900 watchers on Github (most watched JVM project)

• Very active mailing list

• >3000 messages

• >800 members

Page 3: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Before Storm

Queues Workers

Page 4: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

(simplified)

Page 5: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

Workers schemify tweets and append to Hadoop

Page 6: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

Workers update statistics on URLs by incrementing counters in Cassandra

Page 7: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

Use mod/hashing to make sure same URL always goes to same worker

Page 8: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

ScalingDeploy

Reconfigure/redeploy

Page 9: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Problems

• Scaling is painful

• Poor fault-tolerance

• Coding is tedious

Page 10: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

What we want

• Guaranteed data processing

• Horizontal scalability

• Fault-tolerance

• No intermediate message brokers!

• Higher level abstraction than message passing

• “Just works”

Page 11: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Storm

Guaranteed data processing

Horizontal scalability

Fault-tolerance

No intermediate message brokers!

Higher level abstraction than message passing

“Just works”

Page 12: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streamprocessing

Continuouscomputation

DistributedRPC

Use cases

Page 13: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Storm Cluster

Page 14: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Storm Cluster

Master node (similar to Hadoop JobTracker)

Page 15: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Storm Cluster

Used for cluster coordination

Page 16: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Storm Cluster

Run worker processes

Page 17: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Starting a topology

Page 18: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Killing a topology

Page 19: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Concepts

• Streams

• Spouts

• Bolts

• Topologies

Page 20: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streams

Unbounded sequence of tuples

Tuple Tuple Tuple Tuple Tuple Tuple Tuple

Page 21: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Spouts

Source of streams

Page 22: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Spout examples

• Read from Kestrel queue

• Read from Twitter streaming API

Page 23: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Bolts

Processes input streams and produces new streams

Page 24: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Bolts

• Functions

• Filters

• Aggregation

• Joins

• Talk to databases

Page 25: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Topology

Network of spouts and bolts

Page 26: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Tasks

Spouts and bolts execute as many tasks across the cluster

Page 27: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Task execution

Tasks are spread across the cluster

Page 28: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Task execution

Tasks are spread across the cluster

Page 29: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Stream grouping

When a tuple is emitted, which task does it go to?

Page 30: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Stream grouping

• Shuffle grouping: pick a random task

• Fields grouping: mod hashing on a subset of tuple fields

• All grouping: send to all tasks

• Global grouping: pick task with lowest id

Page 31: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Topology

shuffle

[“url”]

shuffle

shuffle

[“id1”, “id2”]

all

Page 32: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streaming word count

TopologyBuilder is used to construct topologies in Java

Page 33: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streaming word count

Define a spout in the topology with parallelism of 5 tasks

Page 34: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streaming word count

Split sentences into words with parallelism of 8 tasks

Page 35: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Consumer decides what data it receives and how it gets grouped

Streaming word count

Split sentences into words with parallelism of 8 tasks

Page 36: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streaming word count

Create a word count stream

Page 37: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streaming word count

splitsentence.py

Page 38: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streaming word count

Page 39: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streaming word count

Submitting topology to a cluster

Page 40: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Streaming word count

Running topology in local mode

Page 41: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Demo

Page 42: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Distributed RPC

Data flow for Distributed RPC

Page 43: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

DRPC Example

Computing “reach” of a URL on the fly

Page 44: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Reach

Reach is the number of unique peopleexposed to a URL on Twitter

Page 45: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Computing reach

URL

Tweeter

Tweeter

Tweeter

Follower

Follower

Follower

Follower

Follower

Follower

Distinct follower

Distinct follower

Distinct follower

Count Reach

Page 46: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Reach topology

Page 47: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Reach topology

Page 48: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Reach topology

Page 49: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Reach topology

Keep set of followers foreach request id in memory

Page 50: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Reach topology

Update followers set whenreceive a new follower

Page 51: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Reach topology

Emit partial count afterreceiving all followers for a

request id

Page 52: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Demo

Page 53: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Guaranteeing message processing

“Tuple tree”

Page 54: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Guaranteeing message processing

• A spout tuple is not fully processed until all tuples in the tree have been completed

Page 55: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Guaranteeing message processing

• If the tuple tree is not completed within a specified timeout, the spout tuple is replayed

Page 56: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Guaranteeing message processing

Reliability API

Page 57: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Guaranteeing message processing

“Anchoring” creates a new edge in the tuple tree

Page 58: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Guaranteeing message processing

Marks a single node in the tree as complete

Page 59: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Guaranteeing message processing

• Storm tracks tuple trees for you in an extremely efficient way

Page 60: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Transactional topologies

How do you do idempotent counting with anat least once delivery guarantee?

Page 61: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Won’t you overcount?

Transactional topologies

Page 62: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Transactional topologies solve this problem

Transactional topologies

Page 63: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Built completely on top of Storm’s primitivesof streams, spouts, and bolts

Transactional topologies

Page 64: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Enables fault-tolerant, exactly-once messaging semantics

Transactional topologies

Page 65: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Batch 1 Batch 2 Batch 3

Transactional topologies

Process small batches of tuples

Page 66: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Batch 1 Batch 2 Batch 3

Transactional topologies

If a batch fails, replay the whole batch

Page 67: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Batch 1 Batch 2 Batch 3

Transactional topologies

Once a batch is completed, commit the batch

Page 68: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Batch 1 Batch 2 Batch 3

Transactional topologies

Bolts can optionally be “committers”

Page 69: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Commit 1

Transactional topologies

Commits are ordered. If there’s a failure during commit, the whole batch + commit is retried

Commit 1 Commit 2 Commit 3 Commit 4 Commit 4

Page 70: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

Page 71: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

New instance of this objectfor every transaction attempt

Page 72: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

Aggregate the count forthis batch

Page 73: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

Only update database if transaction ids differ

Page 74: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

This enables idempotency sincecommits are ordered

Page 75: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example

(Credit goes to Kafka devs for this trick)

Page 76: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Transactional topologies

Multiple batches can be processed in parallel, but commits are guaranteed to be ordered

Page 77: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

• Requires a more sophisticated source queue than Kestrel or RabbitMQ

• storm-contrib has a transactional spout implementation for Kafka

Transactional topologies

Page 78: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example #1

Crawler

Page 79: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example #1

Page 80: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example #2

Twitter Web Analytics

Page 81: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example #2

Page 82: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example #3

Storm as a “database”

Page 83: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Example #3

Page 84: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Storm UI

Page 85: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Storm on EC2

https://github.com/nathanmarz/storm-deploy

One-click deploy tool

Page 86: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Starter code

https://github.com/nathanmarz/storm-starter

Example topologies

Page 87: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Documentation

Page 88: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Ecosystem

• Scala, JRuby, and Clojure DSL’s

• Kestrel, Redis, AMQP, JMS, and other spout adapters

• Multilang adapters

• Cassandra, MongoDB integration

Page 89: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Questions?

http://github.com/nathanmarz/storm

Page 90: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Future work

• State spout

• Storm on Mesos

• “Swapping”

• Auto-scaling

• Higher level abstractions

Page 91: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

KafkaTransactionalSpout

Page 92: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

all

all

all

Page 93: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

all

all

all

TransactionalSpout is a subtopology consisting of a spout and a bolt

Page 94: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

all

all

all

The spout consists of one task that coordinates the transactions

Page 95: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

all

all

all

The bolt emits the batches of tuples

Page 96: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

all

all

all

The coordinator emits a “batch” streamand a “commit stream”

Page 97: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

all

all

all

Batch stream

Page 98: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

all

all

all

Commit stream

Page 99: Storm - Philly Emerging Tech · 2019-12-04 · Distributed and fault-tolerant realtime computation Storm. Basic info ... Storm Cluster Master node (similar to Hadoop JobTracker) Storm

Implementation

all

all

all

Coordinator reuses tuple tree framework to detectsuccess or failure of batches or commits and replays

appropriately