wso2con usa 2017: scalable real-time complex event processing at uber

51
Scalable Real-Time Complex Event Processing @Uber Shuyi Chen Uber Technology Inc.

Upload: wso2-inc

Post on 22-Jan-2018

1.917 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Scalable Real-Time Complex Event Processing @Uber

Shuyi Chen Uber Technology Inc.

Page 2: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

● 6 continents, 70 countries and 400+ cities● Transportation as reliable as running water, everywhere, for

everyone

Uber

Page 3: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Page 4: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Page 5: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Uber is a data-driven company

Page 6: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Thousands of Kafka topics from different services

Page 7: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

We can extract a lot of useful information from this rich set of logs in real-time!

Page 8: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Multiple logins from the same IP in the last 10 minutes

Page 9: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Partner accepted a trip → partner calls rider through the Uber APP

→ rider cancels the trip

Page 10: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Partners reject the second pickup of a UberPOOL trip

Page 11: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Multiple logins from the same IP in the last 10 minutes

Window Aggregation

Page 12: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Partner accepted a trip → partner calls rider through the Uber APP

→ rider cancels the trip

Pattern detection

Page 13: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Partners reject the second pickup of a UberPOOL trip

Filter

Page 14: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Can we use declarative semantics to specify these stream processing logics?

Page 15: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Complex event processing

• Combines data from multiple sources to infer events or patterns that

suggest more complicated circumstances

• CEP is used across many industries for various use cases, including:– Finance: Trade analysis, fraud detection

– Airlines: Operations monitoring

– Healthcare: Claims processing, patient monitoring

– Energy and Telecommunications: Outage detection

• CEP uses declarative rule/query language to specify event processing

logic

Page 16: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

WSO2/Siddhi: Complex event processing engine

• Lightweight, extensible, open source, released as a Java library• Features supported

– Filter– Join– Aggregation– Group by– Window– Pattern processing– Sequence processing– Event tables– Event-time processing– UDF– Extensions– Declarative query language: SiddhiQL

Page 17: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

How Siddhi works

• Specify processing logic declaratively with SiddhiQL

Page 18: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

How Siddhi works

• Query is parsed at runtime into an execution plan runtime • As events flow in, the execution plan runtime process events inside

the CEP engine according the query logic

Page 19: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

How can we make it scalable at Uber scale?

Page 20: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Apache Samza

• A distributed stream processing framework– Distributed and Scalable– Built-in State management– Built-in fault tolerant– At-least-once message processing

Page 21: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

How can we make the stream processing output useful?

Page 22: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Actions

• Generalize a set of common action templates to make it easy for services and human to harness the power of realtime stream processing

• Currently we support– Make an RPC call– Invoke a Webhook endpoint– Index to ElasticSearch– Index to Cassandra– Kafka– Statsd– Chat service– Email– Push notification

Page 23: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Actions

Real-time Scalable Complex Event Processing

Page 24: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Page 25: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
Page 26: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber
Page 27: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Partitioner

• Re-partition events based on key• Support predicate pushdown through query analysis• Support column pruning through query analysis (WIP)

Page 28: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Query processor

• Parse Siddhi queries into execution plan runtime• Process events in Siddhi execution plan runtime• Checkpoint state regularly to ensure recovery upon crash/restart

using RocksDB

Page 29: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Action processor

• Execute actions upon the complex event processing output• Support various kinds of actions for easy integration• Implement action retry mechanism using RocksDB to provide

at-least-once delivery

Page 30: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

How do we translate a query into psychical plan that runs?

Page 31: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

DAG (Directed Acyclic Graph) generation

• Analyze Siddhi query to automatically generate the stream processing DAG in Samza using the processors

Filter, transformation

Page 32: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Join, window, pattern

Page 33: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

More complicated

Page 34: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

No stream processing logic is hard-coded in any of the processors

Page 35: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

REST API backend

• All queries, actions are stored externally in database.• RESTFUL API for CRUD operations• If query/action logic changed

– Redeploy the Samza DAG if needed– Otherwise, the updated queries/actions will be loaded at runtime w/o

interruption

Page 36: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Unified management and monitoring

• Every use case – share the same set of processors– Use queries and actions to describe its processing logic

• A single monitoring template can be reused across different use cases

Page 37: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Production status

• 100+ production use cases• 30+ billion messages processed per day

Page 38: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Applications

• Real-time fraud detection• Real-time anomaly detection• Real-time marketing campaign• Real-time promotion• Real-time monitoring• Real-time feedback system• Real-time analytics• Real-time visualizations• And etc.

Page 39: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Page 40: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Out-of-order event handling

• Not a big concern– Events of the same rider/partner are usually seconds aparts

• K-slack extension in Siddhi for out-of-order event processing

Page 41: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Auto-scaling

• Manually re-partition kafka topics to increase parallelism• Manually tune container memory if needed• Future

– Use CPU/memory/IO stats to auto-scale the data pipelines

Page 42: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Outline

• Motivation

• Architecture

• Limitations

• Challenges

Page 43: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Large checkpointing state

• Samza use Kafka to log state changes• Siddhi engine snapshot can be large• Kafka message size limit to 1MB by default• Solution: we build logics to slice state into smaller pieces and

checkpoint them.

Page 44: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Synchronous checkpointing

• If state is large, time to checkpoint can be long• Samza uses single-threaded model, unsafe to do it asynchronously

(SAMZA-863)

Page 45: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Exactly once state processing?

• Can not commit state and offset atomically• No exactly once state processing

Page 46: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Custom business logic

• Common logic implemented as Siddhi extensions

• Ad-hoc logic implemented as UDF in javascript or scala

Page 47: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Intermediate Kafka messages

• Samza uses Kafka as message queue for intermediate processing output– This can create large load on Kafka if a heave topic is partitioned multiple

times– Encode the intermediate messages to reduce footprint

Page 48: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Multi-tenancy

• Older Siddhi version process events using a thread pool– Bad for multi-tenancy in YARN– Consume more CPU resource than claimed

• Newer version still use thread pool for scheduled task, but main processing in single thread– Good: CPU consumption per YARN container is bounded

Page 49: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Upgrading Samza jobs

• Upgrade Samza jobs require a full restart, and can take minutes due to– Offset checkpointing topic too large → set retention to hours– Changelog topic too large → set retention or enable compaction in

Kafka or host affinity (SAMZA-617)• To minimize the interruption during upgrade, it would be nice to

have– Rolling restart– Per container restart

Page 50: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Our solution: non-interrupted handoff

• For critical jobs, we use replication during upgrade– Start a shadow job – Upgrade shadow– Switch primary and shadow– Upgrade primary– Switch back

• Downside: require 2x capacity during upgrade

Page 51: WSO2Con USA 2017: Scalable Real-time Complex Event Processing at Uber

Thank You!