flink forward sf 2017: chinmay soman - real time analytics in the real world – challenges and...

40
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Near Real-Time Analytics Challenges & Lessons at Uber Engineering

Upload: flink-forward

Post on 21-Apr-2017

117 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

Near Real-Time AnalyticsChallenges & Lessons at Uber Engineering

U B E R | Data

Chinmay Soman @ChinmaySoman

● Staff Software Engineer @ Uber● Tech Lead on Streaming Platform● Background in distributed storage and filesystems● Apache Samza Committer, PMC

Quick Introduction

LAS VEGAS SUMMIT 2015

Billion to Trillions ~ PBMessages/day bytes/day

Apache Kafka at Uber

LAS VEGAS SUMMIT 2015

Billions 100s of TB - PBMessages Processed / day Bytes Processed / day

Near Real-Time Analytics at Uber

U B E R | Data

What is near real-time ?

Decision Time

Seconds to 5 mins

U B E R | Data

Agenda

● Evolution of Business Needs● The case for SQL as building block● New ecosystem using Flink● The road ahead

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

Evolution of Business Needs

U B E R | Data

Case I - Growth Metrics“How many cars are active

right now ?”

“What % of trips have been delayed in the last 5 mins ?”

“What is the % of Uber X trips taken by Android users ?”

U B E R | Data

Events logged to Kafka

Rider eyeballs

Trip updates

U B E R | Data

Artemis

DatabaseKAFKA

Categorize by time

Aggregate value

5 mins

U B E R | Data

KAFKA

Artemis

Database

Delayed events

Corruption

Backpressure

Duplicate events

U B E R | Data

Artemis

Database

Pre-aggregation

KAFKA

Backfill pipelineData Lake

U B E R | Data

Artemis

Database

Pre-aggregationPer dimension

KAFKA Data Explosion

U B E R | Data

Apollo

Intelligent CachingKAFKA

Query Language

Fast Accurate Scalable

U B E R | Data

Case II - Event processing

FRAUD

“If # Signups per device look suspicious -> Ban the driver/rider”

U B E R | Data

Case II - Event processing

INTELLIGENT ALERTS

“Send me an alert if a leased vehicle leaves a geo-fence”

U B E R | Data

Athena platform using Apache Samza

Robust

Ease of operation

No backpressure issues

Built in state management

U B E R | Data

Fraud Rule Engine

Track count # of sign_ups categorized by device_imei

Ban fraudsters in

real-time

Event processing - Apache Samza

KAFKA

U B E R | Data

Case III - OLAP (OnLine Analytical Processing)

A / B Tests

See progress of tests in real-time

U B E R | Data

Case III - OLAP use case

FORECASTING

“How many first time riders will be dropped off

in a given geofence ?”

U B E R | Data

Our integrated platform

● Filter events● Merge streams● Decorate with external data

U B E R | Data

Are we there yet ?

Artemis

Apollo

Event Processing

OLAP

?

2014 2015 2016 2017

SystemComplexity

U B E R | Data

What’s missing ?

● Cumbersome for data scientists / Ops people● Redundant code● Custom backfill pipelines

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

SQL as the building block

U B E R | Data

SQL + Stream Processing

70-80% of jobs can be implemented via SQL

Intelligent Promotions

SQL + Stream Processing: Powerful abstraction

Rule Threshold Action

“All trips worth > 10$ in San Francisco between Friday 5 pm and Sunday 9 pm

> 100 “Give bonus of $500”

U B E R | Data

Complicated rules

● “If number of hours online > 10 …”● “If amount earned > 700 in a given week, then …”● “If # uberPOOL rides >10, then …”● “If trip happens over some geo-fence 10 times in a given weekend, then …”

SQL + Stream Processing: Powerful abstraction

U B E R | Data

Intelligent Promotions

SQL + Stream Processing: Powerful abstraction

> 100 trigger_payment()select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600

Rule Threshold Action

U B E R | Data

Complicated rules

● “If number of hours online > 10 …”● “If amount earned > 700 in a given week, then …”● “If # Uber Pool rides >10, then …”● “If trip happens over some geo-fence 10 times in a given weekend, then …”

What if we created specific rules for specific driver partners ?

SQL + Stream Processing: Powerful abstraction

U B E R | Data

Can be used for alerts as well:

“If a driver X is outside a geofence, then …”

SQL + Stream Processing: Powerful abstraction

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

New eco-system: Athena X

U B E R | Data

Enter Flink

Apache Calcite (SQL) Integration

Easy to manage and scale

No backpressure problem

Built in state management support

HDFS integration

Not dependent on Kafka

U B E R | Data

Rule Store

KAFKA Database

select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600

Promotions using Flink

U B E R | Data

Promotions using Flink

Rule Store

DatabaseKAFKA

select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600

U B E R | Data

New Eco-system: Athena X

HDFS

Kafka

Alerts

Kafka Streams

Other data destinations

Database Streams

HDFS Cassandra

Rule Store

HTTP

U B E R | Data

Are we there yet ?

Artemis

Apollo

Event Processing

OLAP

?

2014 2015 2016 2017

SystemComplexity Flink

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

The road ahead ...

U B E R | Data

Future Discussions

● To (Apache) Beam or not to Beam?● Real-time Machine Learning● Auto scaling

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

AthenaX - Flink deep diveHaohui Mai

Bill Liu

(11:45 am)

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

Thank you

For more: eng.uber.comTwitter: @UberEng