cassandra day ny 2014: message architectures in distributed systems at simplereach
DESCRIPTION
Eric will be presenting on SimpleReach's use of message architectures and why they an important part of a distributed system stack. They are often overlooked because the prevailing sentiment is that the storage and processing engines are the most important aspects of the system. Without the highways, the data won’t be able to get to its destination.TRANSCRIPT
![Page 1: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/1.jpg)
Eric Lubow
@elubow
Message
Architectures in Distributed
Systems
![Page 2: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/2.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Overview
• SimpleReach
• Why is messaging important
• Goals
• Explanations
• Questions
![Page 3: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/3.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Personal Vanity
• CTO of SimpleReach
• Co-author of Practical Cassandra
• Skydiver, Mixed Martial Artist,
Motorcyclist, Dog dad, NY Giants fan
• IronMatt Foundation for Pediatric Brian
Tumors (ironmatt.org)
![Page 4: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/4.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
![Page 5: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/5.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
![Page 6: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/6.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
• Millions of URLs per day
• Over 3.75 billion page views per month
• 7b events per day (~80k events/second)
• Auto-scale 175-190 machines depending on traffic
• Built a predictive measurement algorithm for the social web
SimpleReach
![Page 7: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/7.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Why is Messaging Important?
• Most large scale systems discussions only talk about storage
• Direct high volumes of data around your infrastructure
• Control flow of data through your infrastructure
• Decouple important systems
• Scalability, Elasticity, Deliverability, and Redundancy
• Buffering and Asynchronous communication
![Page 8: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/8.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
The database is NOT a transport layer
![Page 9: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/9.jpg)
App
❶
❹
❸
❷
incoming request
sync persist data
send response
async queue message
Data Flow
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
![Page 10: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/10.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow Patterns
• Enrichment/In-stream Modification Schemes
• Monitoring and Instrumentation
![Page 11: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/11.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Messaging Systems
• RabbitMQ
• ZeroMQ
• Kafka
• Amazon SQS
• NSQ
• ActiveMQ
• Resque
• Custom
![Page 12: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/12.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
What Did SimpleReach Choose?
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
![Page 13: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/13.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
NSQ• Distributed and de-centralized topology
• At least once delivery guaranteed
• Multicast style message routing
• Simple to configure and deploy
• Allow for maintenance windows with no downtime
• Ephemeral channels for testing
• Channel sampling
github.com/bitly/nsq
![Page 14: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/14.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
separate hosts
• a topic is a distinct stream of messages (a single nsqd instance can have multiple topics)
• a channel is an independent queue for a topic (a topic can have multiple channels)
• consumers discover producers by querying nsqlookupd (a discovery service for topics)
• topics and channels are created at runtime (just start publishing/subscribing)
nsqd
“metrics”
Channels
“event”
Topics
“enrichment”
“writer”
Consumers
AAABBB
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Topics and Channels
![Page 15: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/15.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Everyone Speaks The Same Language
http:// + {“content-type”: “application/json”}
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
![Page 16: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/16.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
![Page 17: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/17.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
• nsqadmin provides a web interface to administrate and introspect an NSQ cluster at runtime (and empty, pause, or delete topics/channels)
• nsq_to_http - utility that helps transport an aggregate stream over HTTP
• nsq_to_file - utility that safely persists an aggregated stream to disk
• nsq_stat - iostat like utility for a topic/channel
• nsq_tail - tail like utility for a topic/channel
NSQ Tools
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
![Page 18: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/18.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Right Tool For The Job
![Page 19: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/19.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
![Page 20: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/20.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
NSQNSQD
API
consumer
NSQNSQD
API
NSQNSQD
API
consumer
nsqlookupd
nsqlookupd
PUBLISH
REGISTER
DISCOVER
SUBSCRIBE
How Does It Work?
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
![Page 21: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/21.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
The Schrute of the Problem
![Page 22: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/22.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
![Page 23: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/23.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Simple Deployment & Automation
• Chef cookbook - github.com/simplereach/chef-nsq
• Written in Go
• Easily distributable binaries
• Deploy lookup nodes
• Nsqd’s installed locally
![Page 24: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/24.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
![Page 25: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/25.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
nsqlookupd nsqlookupd
consumer➊ regularly poll for topic producers
➋ connect to all producers
HTTP requests
Runtime Discovery
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
![Page 26: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/26.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
![Page 27: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/27.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Path of a Packet
Internet
EC
Inte
rn
al
AP
I
Solr
C*
Mongo
Redis
Vertica
API
Fire Hose
SC
Co
ns
um
ers
Qu
eu
e
![Page 28: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/28.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
![Page 29: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/29.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
CollectorSocial Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
![Page 30: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/30.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
CollectorSocial Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
![Page 31: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/31.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Broadcast Importance for Polyglottany
Aggregator
Mongo Writer
Broadcast
Redis Writer
Cassandra Writer
Solr Writer
Calculator
NSQ
Vertica Writer
![Page 32: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/32.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
![Page 33: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/33.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
CollectorSocial Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
![Page 34: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/34.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
![Page 35: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/35.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
What Is Enrichment?
A mechanism to add value to a message to enhance processing in
your system
![Page 36: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/36.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
How Do We Enrich
Raw EventEnriched
Event
Consumer A
Consumer B
Consumer C
NSQ Broadcast
![Page 37: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/37.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
• Enrichment
![Page 38: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/38.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Monitoring / Instrumentation
• Comes with statsd support built-in
• Statsd talks to both Graphite and nsqadmin
• Nsqadmin comes with graphs for message processing stats
• Nagios plugins available for monitoring topic/channel depth
• Average end to end latency calculations are done on a per-channel basis
![Page 39: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/39.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Goals• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
• Enrichment
• Monitoring and Instrumentation
![Page 40: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/40.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Summary• Large Systems are more than just storage
• Abstraction
• Highly Available
• Controlled Data Flow Patterns
• Monitoring & Automation
![Page 41: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/41.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
We’re
Hiring
![Page 42: Cassandra Day NY 2014: Message Architectures in Distributed Systems at SimpleReach](https://reader034.vdocuments.site/reader034/viewer/2022051611/54b700dd4a79590a338b4662/html5/thumbnails/42.jpg)
Message Architectures in Distributed Systems Eric Lubow @elubow
Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
Cassandra Day, New York
Thank you.