michael laing architect new york times m ichael.laing@nytimes
DESCRIPTION
Michael Laing Architect New York Times m [email protected]. Millions of users. Classes of service: Gold: replicate/race/resolve Silver: prioritize Bronze: queueable. A Global Mesh with a Memory. Message-based: WebSocket , AMQP, SockJS. Idempotent: Replicating Racy Resolving. - PowerPoint PPT PresentationTRANSCRIPT
Michael LaingArchitect
New York [email protected]
A Global Mesh with a Memory
Message-based: WebSocket, AMQP, SockJSIf in doubt:• Resend• Reconnect• Reread
Idempotent:• Replicating• Racy• Resolving
Classes of service:• Gold: replicate/race/resolve• Silver: prioritize• Bronze: queueable
Millions of users
Event-driven: async using libev
Message: an event with data
Envelope: Routing while in motion & Locating when at rest
Metadata
Body (opaque to us)
Metadata
Body (may be absent)
Message
Envelope
Message: an event with data
RabbitMQ WebSocket S3 / CloudFront Cassandra
Envelope Routing Key Gateway Connection
UUID “Path” & UUID
Metadata Headers: Map / Array
JSON HTTP Headers JSON
Body Blob Blob Blob Blob
Publish
Message Core
Cassandra
S3 / CloudFront
Gateway Device
Init
AMQPCQLWebSocketHTTP
sync
Subscribe
Message Core
Cassandra
S3 / CloudFront
Gateway Device
Init
AMQPCQLWebSocketHTTP
Dismiss
Message Core
Cassandra
Gateway Device
Init
AMQPCQLWebSocket
Core Gateway Device
Cassandra
CoreCore
CoreCore
Gateway
DeviceMessage
S3 / CloudFront
dozens dozens millions millionsmillions
several
Cassandra
dozens
S3 / CloudFront
S3 / CloudFront
S3 / CloudFront
GatewayGateway
GatewayGateway
GatewayGateway
CassandraCassandraCassandraCassandraCassandra
DeviceDeviceDevice
DeviceDevice
DeviceDevice
DeviceDevice
DeviceDevice
Device
MessageMessage
MessageMessage
MessageMessage
MessageMessage
MessageMessage
MessageMessage
Connect
Envelope – 2 forms of addressing
“Path”: 1) Routing a message to a user 2) Finding a message for a user
Message nyt aбrik⨍
Envelope – 2 forms of addressing
“Path”: 1) Routing a message to a user 2) Finding a message for a user
“Postoffice”: Routing a message internally in the nyt a rik⨍ б
Message nyt aбrik⨍
Core Gateway
Core Gateway
The Path hierarchy
Path elements are text (utf-8 but “.” is reserved) – the 1st element is the “category”
“category”: “feeds”,“2nd element”: “breaking-news”“3rd element”: “0012345”
The Path hierarchy
Path elements are text (utf-8 but “.” is reserved) – the 1st element is the “category”
“category”: “feeds”,“2nd element”: “breaking-news”“3rd element”: “0012345”
The elements are joined by “.” for routing
“path”: “feeds.breaking-news.00123456”
Deeper into the Path hierarchy
For persistence, the path denotes a sorted “folder” containing messages in reverse datetime order (using the timestamp from the version 1 uuid uniquely identifying each message)
“feeds.breaking-news.56”/bd1961f5-1062-11e4-a630-406c8f1838fa“feeds.breaking-news.56”/b94e8b45-1062-11e4-900d-406c8f1838fa
Deeper into the Path hierarchy
For persistence, the path denotes a sorted “folder” containing messages in reverse datetime order (using the timestamp from the version 1 uuid uniquely identifying each message)
“feeds.breaking-news.56”/bd1961f5-1062-11e4-a630-406c8f1838fa“feeds.breaking-news.56”/b94e8b45-1062-11e4-900d-406c8f1838fa
Subscribing to a path is done by “binding”, typically with wildcards: “*” matches any one element, “#” matches any sequence of elements
All breaking-news messages: “feeds.breaking-news.#”
More on subscribing & retrieving
Retrieving from persistent storage can be done by path, e.g. the “latest” breaking-news messages for item 56:
“feeds.breaking-news.56”
More on subscribing & retrieving
Retrieving from persistent storage can be done by path, e.g. the “latest” breaking-news messages for item 56:
“feeds.breaking-news.56”
But retrieval can also be done using trailing wild cards:
“feeds.breaking-news.#” will return the “latest” breaking-news messages for all “current” items
More on subscribing & retrieving
Retrieving from persistent storage can be done by path, e.g. the “latest” breaking-news messages for item 56:
“feeds.breaking-news.56”
But retrieval can also be done using trailing wild cards:
“feeds.breaking-news.#” will return the “latest” breaking-news messages for all “current” items
The Cassandra data store is designed to return hierarchical queries with a single request and in the desired order
A notable simplification:
Paths for subscribing to messages and paths for retrieving persisted messages, including the use of wild cards, are the same, e.g.:
A notable simplification:
Paths for subscribing to messages and paths for retrieving persisted messages, including the use of wild cards, are the same, e.g.:
When a user logs in she is “subscribed” using her ID; messages “published” to her will be received while “persisted” messages and subscription preferences are retrieved (a few 10’s of milliseconds)
A notable simplification:
Paths for subscribing to messages and paths for retrieving persisted messages, including the use of wild cards, are the same, e.g.:
When a user logs in she is “subscribed” using her ID; messages “published” to her will be received while “persisted” messages and subscription preferences are retrieved (a few 10’s of milliseconds)
Once subscription preferences arrive, she will be “subscribed” to them and any corresponding “persisted” messages retrieved
The same paths are used for subscription and retrieval
Special Paths for individual routing
Our subscribers (millions of them) have numeric IDs – using those IDs directly for routing, specifically for the “binding” function, would be inefficient
“id.prefs.09067832” (namespace of 3rd element is too large)
Special Paths for individual routing
Our subscribers (millions of them) have numeric IDs – using those IDs directly for routing, specifically for the “binding” function, would be inefficient
“id.prefs.09067832” (namespace of 3rd element is too large)
Instead we convert the ID to base62 elements and take advantage of the patricia trie search structures built into RabbitMQ and our gateway
“id.prefs.c.2.x.M” (equivalent to the above, used for routing)
Postoffice addressing
The “postoffice” is a logical “bus” that connects all the services in all the nyt a rik ⨍ бinstances globally Gateway
Core Gateway
Gateway
Core Gateway
postoffice
logicalview
Postoffice addressing
The “postoffice” is a logical “bus” that connects all the services in all the nyt a rik ⨍ бinstances globally
It is physically segmented and the segments are connected using RabbitMQ “federation”
Gateway
Core Gateway
Gateway
Core Gateway
postoffice
logicalview
Postoffice address elements
Each nyt aбrik service has 3 basic uniquifying elements:⨍
“region”: “us-west-2”,“instance”: “i-123”,“pid”: “12”
Postoffice address elements
Each nyt aбrik service has 3 basic uniquifying elements:⨍
“region”: “us-west-2”,“instance”: “i-123”,“pid”: “12”
And some additional qualifiers:
“product”: “search”,“service”: “route”
Postoffice routing key
Each routing key has a “from” address embedded in it:
“region”: “us-west-2”,“instance”: “i-123”,“pid”: “12”,“product”: “search”,“service”: “resolve”
Postoffice routing key
Each routing key has a “from” address embedded in it:
“region”: “us-west-2”,“instance”: “i-123”,“pid”: “12”,“product”: “search”,“service”: “resolve”
And a “to” address:
“region”: “us-west-2”,“instance”: “-”,“pid”: “-”,“product”: “search”,“service”: “route”
(the “–” means “any”)
Postoffice routing key
Each routing key has a “from” address embedded in it:
“region”: “us-west-2”,“instance”: “i-123”,“pid”: “12”,“product”: “search”,“service”: “resolve”
And a “to” address:
“region”: “us-west-2”,“instance”: “-”,“pid”: “-”,“product”: “search”,“service”: “route”
And an “action”: “action”: “route”
(the “–” means “any”)
Postoffice routing key detail
And they are put together as an ordered sequence like this:
<action>.<from address>.<to address>
Postoffice routing key detail
And they are put together as an ordered sequence like this:
<action>.<from address>.<to address>
“route.\us-west-2.search.resolve.i-123.12.\us-west-2.search.route.-.-”
Postoffice routing key detail
And they are put together as an ordered sequence like this:
<action>.<from address>.<to address>
“route.\us-west-2.search.resolve.i-123.12.\us-west-2.search.route.-.-”
Meaning: This is a request for a “route” action from a specific invocation of the “search” product “resolve” service addressed to any “search” product “route” service in region “us-west-2”
Postoffice binding
Each service invocation “binds” (subscribes) to the postoffice using its unique address to get messages specifically directed to it, e.g. asynchronous RPC responses
<any action>.<any address>.<my address>
“*.\*.*.*.*.*.\us-west-2.search.route.i-123.12”
Postoffice binding for services
Each service invocation also “binds” to the postoffice using addresses that will select messages appropriate for its service
<my action>.<my domain>.<my service>
“route.\us-west-2.*.*.*.*.\*.*.route.*.*”
Postoffice binding for services
Each service invocation also “binds” to the postoffice using addresses that will select messages appropriate for its service
<my action>.<my domain>.<my service>
“route.\us-west-2.*.*.*.*.\*.*.route.*.*”
All this address manipulation is handled by common methods in the nyt aбrik⨍
Routing in the Core
For load balancing on entry to the nyt aбrik⨍ Core
Message
Core
Core
or
Routing in the Core
For replication of important (gold service) messages
Message
Core
Core
and
Routing in the Core
For distribution to all consumers
Core
Core Gateway Device
Gateway Device
Problems with Core instances
Complex connectivity: N(N-1) federation + clustering + …
Problems with Core instances
Complex connectivity: N(N-1) federation + clustering + …
Many services: input, process, resolve, reject, cache_push, …
Problems with Core instances
Complex connectivity: N(N-1) federation + clustering + …
Many services: input, process, resolve, reject, cache_push, …
Hence, problematic to manage
Problems with Core instances
Complex connectivity: N(N-1) federation + clustering + …
Many services: input, process, resolve, reject, cache_push, …
Hence, problematic to manage
And difficult to autoscale
Possible solution: refactor and simplify
A new component, the Rabbit Router, to focus on connectivity and routing
Possible solution: refactor and simplify
A new component, the Rabbit Router, to focus on connectivity and routing
A New Core, with a focus on services
Possible solution: refactor and simplify
A new component, the Rabbit Router, to focus on connectivity and routing
A New Core, with a focus on services
Everything connected to a Rabbit Router
Possible solution: refactor and simplify
A new component, the Rabbit Router, to focus on connectivity and routing
A New Core, with a focus on services
Everything connected to a Rabbit Router
The “bus” becomes a “star”