wso2con usa 2017: scalable real-time complex event processing at uber
TRANSCRIPT
Scalable Real-Time Complex Event Processing @Uber
Shuyi Chen Uber Technology Inc.
● 6 continents, 70 countries and 400+ cities● Transportation as reliable as running water, everywhere, for
everyone
Uber
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Uber is a data-driven company
Thousands of Kafka topics from different services
We can extract a lot of useful information from this rich set of logs in real-time!
Multiple logins from the same IP in the last 10 minutes
Partner accepted a trip → partner calls rider through the Uber APP
→ rider cancels the trip
Partners reject the second pickup of a UberPOOL trip
Multiple logins from the same IP in the last 10 minutes
Window Aggregation
Partner accepted a trip → partner calls rider through the Uber APP
→ rider cancels the trip
Pattern detection
Partners reject the second pickup of a UberPOOL trip
Filter
Can we use declarative semantics to specify these stream processing logics?
Complex event processing
• Combines data from multiple sources to infer events or patterns that
suggest more complicated circumstances
• CEP is used across many industries for various use cases, including:– Finance: Trade analysis, fraud detection
– Airlines: Operations monitoring
– Healthcare: Claims processing, patient monitoring
– Energy and Telecommunications: Outage detection
• CEP uses declarative rule/query language to specify event processing
logic
WSO2/Siddhi: Complex event processing engine
• Lightweight, extensible, open source, released as a Java library• Features supported
– Filter– Join– Aggregation– Group by– Window– Pattern processing– Sequence processing– Event tables– Event-time processing– UDF– Extensions– Declarative query language: SiddhiQL
How Siddhi works
• Specify processing logic declaratively with SiddhiQL
How Siddhi works
• Query is parsed at runtime into an execution plan runtime • As events flow in, the execution plan runtime process events inside
the CEP engine according the query logic
How can we make it scalable at Uber scale?
Apache Samza
• A distributed stream processing framework– Distributed and Scalable– Built-in State management– Built-in fault tolerant– At-least-once message processing
How can we make the stream processing output useful?
Actions
• Generalize a set of common action templates to make it easy for services and human to harness the power of realtime stream processing
• Currently we support– Make an RPC call– Invoke a Webhook endpoint– Index to ElasticSearch– Index to Cassandra– Kafka– Statsd– Chat service– Email– Push notification
Actions
Real-time Scalable Complex Event Processing
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Partitioner
• Re-partition events based on key• Support predicate pushdown through query analysis• Support column pruning through query analysis (WIP)
Query processor
• Parse Siddhi queries into execution plan runtime• Process events in Siddhi execution plan runtime• Checkpoint state regularly to ensure recovery upon crash/restart
using RocksDB
Action processor
• Execute actions upon the complex event processing output• Support various kinds of actions for easy integration• Implement action retry mechanism using RocksDB to provide
at-least-once delivery
How do we translate a query into psychical plan that runs?
DAG (Directed Acyclic Graph) generation
• Analyze Siddhi query to automatically generate the stream processing DAG in Samza using the processors
Filter, transformation
Join, window, pattern
More complicated
No stream processing logic is hard-coded in any of the processors
REST API backend
• All queries, actions are stored externally in database.• RESTFUL API for CRUD operations• If query/action logic changed
– Redeploy the Samza DAG if needed– Otherwise, the updated queries/actions will be loaded at runtime w/o
interruption
Unified management and monitoring
• Every use case – share the same set of processors– Use queries and actions to describe its processing logic
• A single monitoring template can be reused across different use cases
Production status
• 100+ production use cases• 30+ billion messages processed per day
Applications
• Real-time fraud detection• Real-time anomaly detection• Real-time marketing campaign• Real-time promotion• Real-time monitoring• Real-time feedback system• Real-time analytics• Real-time visualizations• And etc.
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Out-of-order event handling
• Not a big concern– Events of the same rider/partner are usually seconds aparts
• K-slack extension in Siddhi for out-of-order event processing
Auto-scaling
• Manually re-partition kafka topics to increase parallelism• Manually tune container memory if needed• Future
– Use CPU/memory/IO stats to auto-scale the data pipelines
Outline
• Motivation
• Architecture
• Limitations
• Challenges
Large checkpointing state
• Samza use Kafka to log state changes• Siddhi engine snapshot can be large• Kafka message size limit to 1MB by default• Solution: we build logics to slice state into smaller pieces and
checkpoint them.
Synchronous checkpointing
• If state is large, time to checkpoint can be long• Samza uses single-threaded model, unsafe to do it asynchronously
(SAMZA-863)
Exactly once state processing?
• Can not commit state and offset atomically• No exactly once state processing
Custom business logic
• Common logic implemented as Siddhi extensions
• Ad-hoc logic implemented as UDF in javascript or scala
Intermediate Kafka messages
• Samza uses Kafka as message queue for intermediate processing output– This can create large load on Kafka if a heave topic is partitioned multiple
times– Encode the intermediate messages to reduce footprint
Multi-tenancy
• Older Siddhi version process events using a thread pool– Bad for multi-tenancy in YARN– Consume more CPU resource than claimed
• Newer version still use thread pool for scheduled task, but main processing in single thread– Good: CPU consumption per YARN container is bounded
Upgrading Samza jobs
• Upgrade Samza jobs require a full restart, and can take minutes due to– Offset checkpointing topic too large → set retention to hours– Changelog topic too large → set retention or enable compaction in
Kafka or host affinity (SAMZA-617)• To minimize the interruption during upgrade, it would be nice to
have– Rolling restart– Per container restart
Our solution: non-interrupted handoff
• For critical jobs, we use replication during upgrade– Start a shadow job – Upgrade shadow– Switch primary and shadow– Upgrade primary– Switch back
• Downside: require 2x capacity during upgrade
Thank You!