re-envisioning the lambda architecture : web services & real-time analytics w/ storm and...
DESCRIPTION
Re-envisioning the Lambda Architecture:Web Services & Real-time Analytics w/ Storm and CassandraTRANSCRIPT
![Page 1: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/1.jpg)
Re-envisioning the Lambda Architecture:Web Services & Real-time Analytics w/
Storm and Cassandra
Brian O’Neill, [email protected]@boneill42
![Page 2: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/2.jpg)
Talk Breakdown
29%
20%31%
20%
Topics
(1) Motivation(2) Polyglot Persistence(3) Analytics(4) Lambda Architecture
![Page 3: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/3.jpg)
Health Market Science - Then
What we were.
![Page 4: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/4.jpg)
Health Market Science - Now
Intersecting Big Data w/ Healthcare
We’re fixing healthcare!Dashboards
AnalyticsData Management
Web Services
(api.hmsonline.com)
![Page 5: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/5.jpg)
Data Pipelines
I/O
![Page 6: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/6.jpg)
The InputFrom government,state boards, etc.
From the internet,social data,networks / graphs
From third-parties,medical claims
From customers,expenses,sales data,beneficiary
information,quality scores
DataPipeline> 2,000+ so
urces
> 1B events / year
![Page 7: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/7.jpg)
The Output
Script
Claims
Expense
Sanction
Address
Contact(phone, fax, etc.)
Drug
RepresentativeDivision
Expense ManagerTM
Provider Verification™
MarketViewTM
CustomerFeed(s)
CustomerMaster
Provider MasterFileTM
Credentials
“Agile MDM”
1 billion claims per year
Organization
Practitioner
Referrals
> 8M practitioners
> 5B claims
> 5 years of histo
ry
![Page 8: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/8.jpg)
Sounds easy
Except...Incomplete CaptureNo foreign keysDiffering schemasChanging schemasConflicting informationAd-hoc Analysis (is hard)Point-In-Time Retrieval
![Page 9: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/9.jpg)
Why?
?’s
Is this doctor,Licensed?Sanctioned?Influential?
Is this e
xpense,
Legal?
Compliant?
Reported?
Is this claim,
Fraudulent?
Wasteful?
Abusive?
Compliance & Safety
Sales Operations
Cost Control
Transparency
Is this market,
Saturated?
Penetrable?Marketing Optimization
![Page 10: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/10.jpg)
Our MDM Pipeline
- Data Stewardship- Data Scientists- Business Analysts
Ingestion- Semantic Tagging- Standardization- Data Mapping
Incorporation- Consolidation- Enumeration- Association
Insight- Search- Reports- Analytics
Feeds(multiple formats,
changing over time)API / FTP Web Interface
DimensionsLogicRules
![Page 11: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/11.jpg)
Our first “Pipeline”
+
![Page 12: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/12.jpg)
Sweet!
Dirt SimpleLightning Fast
Highly AvailableScalable
Multi-Datacenter (DR)
![Page 13: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/13.jpg)
Not Sweet.
How do we query the data?NoSQL Indexes?
Do such things exist?
![Page 14: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/14.jpg)
Rev. 1 – Wide Rows!
AOPTriggers!Data model to
support your queries.
9 7 32 74 99 12 42
$3.50 $7.00 $8.75 $1.00 $4.20 $3.17 $8.88
ONC : PA : 19460
D’Oh! What about ad hoc?
![Page 15: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/15.jpg)
Transformation
Rev 2 – Elastic Search!
AOPTriggers!
D’Oh!What if ES fails?What about schema / type
information?
![Page 16: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/16.jpg)
Rev 3 - Apache Storm!
![Page 17: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/17.jpg)
Anatomy of a Storm Cluster
• Nimbus– Master Node
• Zookeeper– Cluster Coordination
• Supervisors– Worker Nodes
![Page 18: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/18.jpg)
Storm Primitives
• Streams– Unbounded sequence of tuples
• Spouts– Stream Sources
• Bolts– Unit of Computation
• Topologies– Combination of n Spouts and m Bolts– Defines the overall “Computation”
![Page 19: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/19.jpg)
Storm Spouts
• Represents a source (stream) of data– Queues (JMS, Kafka, Kestrel, etc.)– Twitter Firehose– Sensor Data
• Emits “Tuples” (Events) based on source– Primary Storm data structure– Set of Key-Value pairs
![Page 20: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/20.jpg)
Storm Bolts
• Receive Tuples from Spouts or other Bolts• Operate on, or React to Data
– Functions/Filters/Joins/Aggregations– Database writes/lookups
• Optionally emit additional Tuples
![Page 21: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/21.jpg)
Storm Topologies
• Data flow between spouts and bolts• Routing of Tuples between spouts/bolts
– Stream “Groupings”• Parallelism of Components• Long-Lived
![Page 22: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/22.jpg)
Storm Topologies
![Page 23: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/23.jpg)
Persistent Word Count
http://github.com/hmsonline/storm-cassandra
![Page 24: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/24.jpg)
NEXT LEVEL : TRIDENT
![Page 25: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/25.jpg)
Trident
• Part of Storm• Provides a higher-level abstraction for stream
processing– Constructs for state management and batching
• Adds additional primitives that abstract away common topological patterns
![Page 26: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/26.jpg)
Trident State
Sequences writes by batch• Spouts
– Transactional• Batch contents never change
– Opaque• Batch contents can change
• State– Transactional
• Store batch number with counts to maintain sequencing of writes
– Opaque• Store previous value in order to overwrite the current value when
contents of a batch change
![Page 27: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/27.jpg)
State Management
Last Batch Value
15 1000 (+59)Last Batch Value
16 1059
Transactional
Last Batch Previous Current
15 980 1000 (+59)
Opaquereplay == incorporated already?
(because batch composition is the same)
Last Batch Previous Current
16 1000 1059
Last Batch Previous Current
15 980 1000 (+72)Last Batch Previous Current
16 1000 1072
replay == re-incorporate
Batch composition changes! (not guaranteed)
![Page 28: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/28.jpg)
BACK TO OUR REGULARLY SCHEDULED TALK
![Page 29: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/29.jpg)
Polyglot Persistence“The Right Tool for the Job”
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.
![Page 30: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/30.jpg)
Back to the Pipeline
KafkaDW
Storm
C* ES Titan SQL
![Page 31: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/31.jpg)
MDM Topology*
*Notional
![Page 32: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/32.jpg)
Design Principles
• What we got:– At-least-once processing– Simple data flows
• What we needed to account for:– Replays
Idempotent Operations!Immutable Data! FTW!
![Page 33: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/33.jpg)
Cassandra State (v0.4.0)
[email protected]:hmsonline/storm-cassandra.git
{tuple} <mapper> (ks, cf, row, k:v[])Storm Cassandra
![Page 34: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/34.jpg)
Trident Elastic Search (v0.3.1)
[email protected]:hmsonline/trident-elasticsearch.git
{tuple} <mapper> (idx, docid, k:v[])Storm Elastic Search
![Page 35: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/35.jpg)
Storm Graph (v0.1.2)
Coming soon [email protected]:hmsonline/storm-graph.git
for (tuple : batch)<processor> (graph, tuple)
![Page 36: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/36.jpg)
Storm JDBI (v0.1.14)
INTERNAL ONLY (so far)Worth releasing?
{tuple} <mapper> (JDBC Statement)
![Page 37: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/37.jpg)
All good!
![Page 38: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/38.jpg)
But...
What was the average amount paid for a medical claim associated with procedure X by zip code over the last five years?
![Page 39: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/39.jpg)
Hadoop (<2)? Batch?
Yuck. ‘Nuff Said.http://www.slideshare.net/prash1784/introduction-to-hadoop-and-pig-15036186
![Page 40: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/40.jpg)
Alternatives?
DRUID
![Page 41: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/41.jpg)
Let’s Pre-Compute It!
stream.groupBy(new
Field(“procedure”)).groupBy(new Field(“zip”)).aggregate(new Field(“amount”),
new Average())D’Oh! GroupBy’s.They set data in motion!
![Page 42: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/42.jpg)
Lesson Learned
https://github.com/nathanmarz/storm/wiki/Trident-API-Overview
If possible, avoid re-partitioning operations!
(e.g. LOG.error!)
![Page 43: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/43.jpg)
Why so hard?
D’Oh!
19 != 9
What we don’t want:LOCKS!
What’s the alternative?CONSENSUS!
![Page 44: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/44.jpg)
Cassandra 2.0!
http://www.slideshare.net/planetcassandra/nyc-jonathan-ellis-keynote-cassandra-12-20http://www.cs.cornell.edu/courses/CS6452/2012sp/papers/paxos-complex.pdf
![Page 45: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/45.jpg)
Conditional Updates
“The alert reader will notice here that Paxos gives us the ability to agree on exactly one proposal. After one has been accepted, it will be returned to future leaders in the promise, and the new leader will have to re-propose it again.”http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0
UPDATE value=9 WHERE word=“fox” IF value=6
![Page 46: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/46.jpg)
Love CQL
Conditional Updates+
Batch Statements+
Collections=
BADASS DATA MODELS
![Page 47: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/47.jpg)
Announcing : Storm Cassandra CQL!
[email protected]:hmsonline/storm-cassandra-cql.git
{tuple} <mapper> (CQL Statement)
Trident Batching =? CQL Batching
![Page 48: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/48.jpg)
Incremental State!
• Collapse aggregation into the state object.– This allows the state object to aggregate with current state in a loop
until success.
• Uses Trident Batching to perform in-memory aggregation for the batch.
for (tuple : batch)state.aggregate(tuple);
while (failed?) {persisted_state = read(state)aggregate(in_memory_state, persisted_state)failed? = conditionally_update(state)
}
![Page 49: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/49.jpg)
Partition 1
In-Memory Aggregation by Key!
Key Value
fox 6
brown 3
Partition 2
Key Value
fox 3
lazy 72C*
No More GroupBy!
![Page 50: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/50.jpg)
To protect against replays
Use partition + batch identifier(s) in your conditional update!
“BatchId + partitionIndex consistently represents the same data as long as:
1.Any repartitioning you do is deterministic (so partitionBy is, but shuffle is not)
2.You're using a spout that replays the exact same batch each time (which is true of transactional spouts but not of opaque transactional spouts)”
- Nathan Marz
![Page 51: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/51.jpg)
Hyper-Cubes!
![Page 52: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/52.jpg)
Our Terminology
• A cube comprises:– Dimensions (e.g. procedure, zip, time slice)– A function (e.g. count, sum)– Function fields (e.g. amount paid)– Granularity
• A metric comprises:– Coordinates (e.g. vasectomy, 19460, 879123 – hour since epoch)– A value (e.g. 500 procedures)
• A perspective comprises:– A range of coordinates (*, 19460, January)– An interval (day)
![Page 53: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/53.jpg)
Complex Event Processing
• For each event,– Find relevant cubes– Adjust metrics for event– Group and aggregate metrics– Use conditional updates to incorporate
metric into persistent state
![Page 54: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/54.jpg)
The Lambda Architecture
http://architects.dzone.com/articles/nathan-marzs-lamda
![Page 55: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/55.jpg)
Let’s Challenge This a Bit
because “additional tools and techniques” cost money and time.
• Questions:– Can we solve the problem with a single tool and a
single approach?– Can we re-use logic across layers?– Or better yet, can we collapse layers?
![Page 56: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/56.jpg)
A Traditional Interpretation
Speed Layer(Storm)
Batch Layer(Hadoop)
DataStream
Serving Layer
HBase
Impala
D’Oh! Two pipelines!
![Page 57: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/57.jpg)
Integrating Web Services
• We need a web service that receives an event and provides,– an immediate acknowledgement– a high likelihood that the data is integrated very soon– a guarantee that the data will be integrated
eventually
• We need an architecture that provides for,– Code / Logic and approach re-use– Fault-Tolerance
![Page 58: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/58.jpg)
Grand Finale
![Page 59: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/59.jpg)
The Idea : Embedding State!
KafkaDropWizard
C*
IncrementalCqlStateaggregate(tuple)
“Batch” Layer(Storm)
Client
![Page 60: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/60.jpg)
The Sequence of Events
![Page 61: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/61.jpg)
The Wins
• Reuse Aggregations and State Code!• To re-compute (or backfill) a dimension,
simply re-queue!• Storm is the “safety” net
– If a DW host fails during aggregation, Storm will fill in the gaps for all ACK’d events.
• Is there an opportunity to reuse more?– BatchingStrategy & PartitionStrategy?
![Page 62: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/62.jpg)
In the end, all good. =)
![Page 63: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/63.jpg)
Plug
The Book
Shout out:Taylor Goetz
![Page 65: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/65.jpg)
APPENDIX
![Page 66: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/66.jpg)
CassandraCqlStatepublic void commit(Long txid) {
BatchStatement batch = new BatchStatement(Type.LOGGED); batch.addAll(this.statements); clientFactory.getSession().execute(batch); }
public void addStatement(Statement statement) { this.statements.add(statement); } public ResultSet execute(Statement statement){ return clientFactory.getSession().execute(statement); }
![Page 67: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/67.jpg)
CassandraCqlStateUpdater
public void updateState(CassandraCqlState state, List<TridentTuple> tuples, TridentCollector collector) {
for (TridentTuple tuple : tuples) { Statement statement = this.mapper.map(tuple); state.addStatement(statement); } }
![Page 68: Re-envisioning the Lambda Architecture : Web Services & Real-time Analytics w/ Storm and Cassandra](https://reader036.vdocuments.site/reader036/viewer/2022081602/554a1bfcb4c90507558b5691/html5/thumbnails/68.jpg)
ExampleMapperpublic Statement map(List<String> keys, Number value) {
Insert statement = QueryBuilder.insertInto(KEYSPACE_NAME, TABLE_NAME);statement.value(KEY_NAME, keys.get(0));statement.value(VALUE_NAME, value);return statement;
}
public Statement retrieve(List<String> keys) {Select statement = QueryBuilder.select()
.column(KEY_NAME).column(VALUE_NAME)
.from(KEYSPACE_NAME, TABLE_NAME)
.where(QueryBuilder.eq(KEY_NAME, keys.get(0))); return statement;}