air traffic controller - streams processing meetup

32
Air Traffic Controller Using Samza to manage communications with members By: Cameron Lee and Shubhanshu Nagar

Upload: ed-yakabosky

Post on 14-Apr-2017

722 views

Category:

Engineering


3 download

TRANSCRIPT

Page 1: Air traffic controller  - Streams Processing meetup

Air Traffic ControllerUsing Samza to manage communications with membersBy: Cameron Lee and Shubhanshu Nagar

Page 2: Air traffic controller  - Streams Processing meetup

Outline

Problem Statement

How ATC Solves it

Implementation

Interesting Features

Page 3: Air traffic controller  - Streams Processing meetup

What problem are we trying to solve?

In the past, LinkedIn provided a poor communications experience to some of its members.

Too much email, low quality email, fired on multiple channels at once

Our goal was to build a system which could apply some common functionality across many different communication types and use cases in order to improve the member experience.

Handle thousands of communications per second

Good understanding of state of members on the site in near-real-time

Page 4: Air traffic controller  - Streams Processing meetup

How does ATC think about creatinga delightful member experience?

Page 5: Air traffic controller  - Streams Processing meetup

5 Rights

Right member

Right messageUseful to member

Shouldn’t have seen it before

Right frequency

Right channel

Right time

Page 6: Air traffic controller  - Streams Processing meetup

Filtering

Don’t send stale messagesDon’t send spammy messagesDon’t send duplicate messages

Page 7: Air traffic controller  - Streams Processing meetup

Aggregation and CappingDon’t flood me. Consolidate if you have too much to say.

Page 8: Air traffic controller  - Streams Processing meetup

Channel Selection“Don’t blast all channels at the same time”

Page 9: Air traffic controller  - Streams Processing meetup

Delivery-time Optimization

● Hold on to a message and deliver it at the right moment.

● Ex: Don’t buzz my phone at 2 AM.● I like to read my daily digests every day after

work.

Page 10: Air traffic controller  - Streams Processing meetup

How did we build this thing?

Page 11: Air traffic controller  - Streams Processing meetup

Requirements for ATC

● Highly-scalable● Nearline (but close to real-time!)● Ingest data from many sources● Persist some data, but most needs are low TTL

Page 12: Air traffic controller  - Streams Processing meetup

What’s ATC built on?

Page 13: Air traffic controller  - Streams Processing meetup

Ecosystem

Message Delivery Service

Offline apps

Online apps

ATC

Relevance scores

User action data

Page 14: Air traffic controller  - Streams Processing meetup

Persistence: RocksDB

Out-of-the-box storage layerWrite-optimized for high performance on SSDs. Changelogs provide fault tolerance and

bootstrapping capabilities

Page 15: Air traffic controller  - Streams Processing meetup

ATC Pipelineinstance 1ATC

Repartitioner

Re-partitioning of events

External services ATC

Pipelineinstance n

Page 16: Air traffic controller  - Streams Processing meetup

ATC task

ExternalRequests

ChannelSelection

Message Delivery Service

Scheduler

Filtering

Message Data Tree Generation

Aggregation & Capping

Hipster Stream Processing

Page 17: Air traffic controller  - Streams Processing meetup

Implementation Details

Page 18: Air traffic controller  - Streams Processing meetup

Streaming Technologies

Kafka: publish-subscribe messaging systemUsed to send input to ATC to trigger communications

Many actions and signals in the LinkedIn ecosystem are tracked in kafka events. We can consume these signals to better understand the state of the ecosystem.

Databus: change capture system for databasesProduces an event whenever an entry in a database changes

Page 19: Air traffic controller  - Streams Processing meetup

Host affinity

By default, whenever a Samza app is deployed, the task instances can be moved to any host in the cluster, regardless of where the instances were previously deployed.

If there was any state saved (e.g. RocksDB), then the new instances would have to rebuild that state off of the changelog. This bootstrapping can take some time depending on the amount of data to reload. Task instances can’t process new input until bootstrapping is complete.

We have some use cases which can’t be delayed for the amount of time it takes to bootstrap.

Page 20: Air traffic controller  - Streams Processing meetup

Host affinity (continued)

Host affinity is a Samza feature which allows us to deploy task instances back to the same hosts from the previous deployment, so state does not need to be reloaded.

In case of failures for individual instances, Samza can fallback to moving the instance elsewhere and bootstrapping off of the changelog.

Page 21: Air traffic controller  - Streams Processing meetup

Multiple datacenters

Samza does not currently support replicating persistent application state (e.g. RocksDB) across multiple clusters which are running the same app.

We need ATC to run in multiple datacenters for redundancy.

We need to have state in each datacenter so that if we have to move processing between datacenters, then we can continue to properly handle input.

Page 22: Air traffic controller  - Streams Processing meetup

Multiple datacenters

We rely on the input streams to replicate the main input so that we can do processing and build up state in all datacenters.

The side effects (trigger the actual email send) then will only get emitted by one of the datacenters. We can dynamically choose where side effects are triggered.

Page 23: Air traffic controller  - Streams Processing meetup

Multiple datacenters (continued)

Page 24: Air traffic controller  - Streams Processing meetup

Deployments

When we deploy changes to ATC, we can deploy to a single datacenter at a time in order to test new versions on only a fraction of traffic.

In some cases, we shift all side effects out of a datacenter to do an upgrade. Since we still process all input, we can validate almost all of our functionality and ensure performance doesn’t take an unexpected hit.

Page 25: Air traffic controller  - Streams Processing meetup

Store migrations

In some cases, we need to migrate our system to use a new instance of a store.

For example, when support was added to use RocksDB TTL, we needed to migrate some of our stores.

Since we only needed the last X days of data, we could use the following strategy for the migration:

Write to both the old and new store for X days, but continue to read from the old store.

After X days, read from the new store, but continue writing both stores so we could fall back if something went wrong.

After validating that the new store was correct, remove the old store.

Page 26: Air traffic controller  - Streams Processing meetup

Personalization through relevance

We work closely with a relevance team in order to make better decisions about the communications we send out.

e.g. channel selection, delivery time, aggregation thresholds

Every day, scores for different decisions are computed offline (Hadoop) by the relevance team. Those scores are pushed to ATC through Kafka, and then ATC stores the scores in RocksDB.

Scores are generated for each member, so we can personalize the experience.

Page 27: Air traffic controller  - Streams Processing meetup

Interesting features

Page 28: Air traffic controller  - Streams Processing meetup

Remote calls

Some data is not available on a Kafka stream in a pragmatic way

We make REST requests to fetch that data

Done at the beginning of pipeline

Extract event

Make remote calls and decorate event

Process decorated event

Makes our Samza job IO bound

Page 29: Air traffic controller  - Streams Processing meetup

Remote calls - Efficiently

Use ParSeq

Framework to write asynchronous code in Java

Open Sourced

ParSeq uses a thread pool for making remote calls

Rest of processing happens serially

Checkpointing handled by applicationCheckpoint every 10 seconds

Flush all requests in-flight before checkpointing

Plan on integrating with SAMZA-863

Page 30: Air traffic controller  - Streams Processing meetup

Real-time Processing

Some messages require real-time latency

Tuned Kafka’s batching configuration to achieve sub-second of pre-ATC latency

Can be tuned even more aggressively!

ATC/Samza processes most events in 2-3 ms

No remote calls for these messages

Page 31: Air traffic controller  - Streams Processing meetup

Scheduler

Scheduler RocksDB

Scheduled requests (from aggregation, follow-up, etc.)

schedule

Window task (periodic)

check

trigge

rs

Other processing

Message Delivery Service

Page 32: Air traffic controller  - Streams Processing meetup

Questions?