wso2con usa 2017: scalable real-time complex event processing at uber

Scalable Real-Time Complex Event Processing @Uber

Shuyi Chen Uber Technology Inc.

● 6 continents, 70 countries and 400+ cities● Transportation as reliable as running water, everywhere, for

everyone

Uber

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Uber is a data-driven company

Thousands of Kafka topics from different services

We can extract a lot of useful information from this rich set of logs in real-time!

Multiple logins from the same IP in the last 10 minutes

Partner accepted a trip → partner calls rider through the Uber APP

→ rider cancels the trip

Partners reject the second pickup of a UberPOOL trip

Multiple logins from the same IP in the last 10 minutes

Window Aggregation

Partner accepted a trip → partner calls rider through the Uber APP

→ rider cancels the trip

Pattern detection

Partners reject the second pickup of a UberPOOL trip

Filter

Can we use declarative semantics to specify these stream processing logics?

Complex event processing

• Combines data from multiple sources to infer events or patterns that

suggest more complicated circumstances

• CEP is used across many industries for various use cases, including:– Finance: Trade analysis, fraud detection

– Airlines: Operations monitoring

– Healthcare: Claims processing, patient monitoring

– Energy and Telecommunications: Outage detection

• CEP uses declarative rule/query language to specify event processing

logic

WSO2/Siddhi: Complex event processing engine

• Lightweight, extensible, open source, released as a Java library• Features supported

– Filter– Join– Aggregation– Group by– Window– Pattern processing– Sequence processing– Event tables– Event-time processing– UDF– Extensions– Declarative query language: SiddhiQL

How Siddhi works

• Specify processing logic declaratively with SiddhiQL

How Siddhi works

• Query is parsed at runtime into an execution plan runtime • As events flow in, the execution plan runtime process events inside

the CEP engine according the query logic

How can we make it scalable at Uber scale?

Apache Samza

• A distributed stream processing framework– Distributed and Scalable– Built-in State management– Built-in fault tolerant– At-least-once message processing

How can we make the stream processing output useful?

Actions

• Generalize a set of common action templates to make it easy for services and human to harness the power of realtime stream processing

• Currently we support– Make an RPC call– Invoke a Webhook endpoint– Index to ElasticSearch– Index to Cassandra– Kafka– Statsd– Chat service– Email– Push notification

Actions

Real-time Scalable Complex Event Processing

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Partitioner

• Re-partition events based on key• Support predicate pushdown through query analysis• Support column pruning through query analysis (WIP)

Query processor

• Parse Siddhi queries into execution plan runtime• Process events in Siddhi execution plan runtime• Checkpoint state regularly to ensure recovery upon crash/restart

using RocksDB

Action processor

• Execute actions upon the complex event processing output• Support various kinds of actions for easy integration• Implement action retry mechanism using RocksDB to provide

at-least-once delivery

How do we translate a query into psychical plan that runs?

DAG (Directed Acyclic Graph) generation

• Analyze Siddhi query to automatically generate the stream processing DAG in Samza using the processors

Filter, transformation

Join, window, pattern

More complicated

No stream processing logic is hard-coded in any of the processors

REST API backend

• All queries, actions are stored externally in database.• RESTFUL API for CRUD operations• If query/action logic changed

– Redeploy the Samza DAG if needed– Otherwise, the updated queries/actions will be loaded at runtime w/o

interruption

Unified management and monitoring

• Every use case – share the same set of processors– Use queries and actions to describe its processing logic

• A single monitoring template can be reused across different use cases

Production status

• 100+ production use cases• 30+ billion messages processed per day

Applications

• Real-time fraud detection• Real-time anomaly detection• Real-time marketing campaign• Real-time promotion• Real-time monitoring• Real-time feedback system• Real-time analytics• Real-time visualizations• And etc.

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Out-of-order event handling

• Not a big concern– Events of the same rider/partner are usually seconds aparts

• K-slack extension in Siddhi for out-of-order event processing

Auto-scaling

• Manually re-partition kafka topics to increase parallelism• Manually tune container memory if needed• Future

– Use CPU/memory/IO stats to auto-scale the data pipelines

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Large checkpointing state

• Samza use Kafka to log state changes• Siddhi engine snapshot can be large• Kafka message size limit to 1MB by default• Solution: we build logics to slice state into smaller pieces and

checkpoint them.

Synchronous checkpointing

• If state is large, time to checkpoint can be long• Samza uses single-threaded model, unsafe to do it asynchronously

(SAMZA-863)

https://issues.apache.org/jira/browse/SAMZA-863

Exactly once state processing?

• Can not commit state and offset atomically• No exactly once state processing

Custom business logic

• Common logic implemented as Siddhi extensions

• Ad-hoc logic implemented as UDF in javascript or scala

Intermediate Kafka messages

• Samza uses Kafka as message queue for intermediate processing output– This can create large load on Kafka if a heave topic is partitioned multiple

times– Encode the intermediate messages to reduce footprint

Multi-tenancy

• Older Siddhi version process events using a thread pool– Bad for multi-tenancy in YARN– Consume more CPU resource than claimed

• Newer version still use thread pool for scheduled task, but main processing in single thread– Good: CPU consumption per YARN container is bounded

Upgrading Samza jobs

• Upgrade Samza jobs require a full restart, and can take minutes due to– Offset checkpointing topic too large → set retention to hours– Changelog topic too large → set retention or enable compaction in

Kafka or host affinity (SAMZA-617)• To minimize the interruption during upgrade, it would be nice to

have– Rolling restart– Per container restart

https://issues.apache.org/jira/browse/SAMZA-617

Our solution: non-interrupted handoff

• For critical jobs, we use replication during upgrade– Start a shadow job – Upgrade shadow– Switch primary and shadow– Upgrade primary– Switch back

• Downside: require 2x capacity during upgrade

Thank You!