flink forward sf 2017: chinmay soman - real time analytics in the real world – challenges and...

40
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Near Real-Time Analytics Challenges & Lessons at Uber Engineering

Upload: flink-forward

Post on 21-Apr-2017

107 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

Near Real-Time AnalyticsChallenges & Lessons at Uber Engineering

Page 2: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Chinmay Soman @ChinmaySoman

● Staff Software Engineer @ Uber● Tech Lead on Streaming Platform● Background in distributed storage and filesystems● Apache Samza Committer, PMC

Quick Introduction

Page 3: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

LAS VEGAS SUMMIT 2015

Billion to Trillions ~ PBMessages/day bytes/day

Apache Kafka at Uber

Page 4: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

LAS VEGAS SUMMIT 2015

Billions 100s of TB - PBMessages Processed / day Bytes Processed / day

Near Real-Time Analytics at Uber

Page 5: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

What is near real-time ?

Decision Time

Seconds to 5 mins

Page 6: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Agenda

● Evolution of Business Needs● The case for SQL as building block● New ecosystem using Flink● The road ahead

Page 7: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

Evolution of Business Needs

Page 8: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Case I - Growth Metrics“How many cars are active

right now ?”

“What % of trips have been delayed in the last 5 mins ?”

“What is the % of Uber X trips taken by Android users ?”

Page 9: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Events logged to Kafka

Rider eyeballs

Trip updates

Page 10: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Artemis

DatabaseKAFKA

Categorize by time

Aggregate value

5 mins

Page 11: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

KAFKA

Artemis

Database

Delayed events

Corruption

Backpressure

Duplicate events

Page 12: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Artemis

Database

Pre-aggregation

KAFKA

Backfill pipelineData Lake

Page 13: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Artemis

Database

Pre-aggregationPer dimension

KAFKA Data Explosion

Page 14: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Apollo

Intelligent CachingKAFKA

Query Language

Fast Accurate Scalable

Page 15: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Case II - Event processing

FRAUD

“If # Signups per device look suspicious -> Ban the driver/rider”

Page 16: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Case II - Event processing

INTELLIGENT ALERTS

“Send me an alert if a leased vehicle leaves a geo-fence”

Page 17: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Athena platform using Apache Samza

Robust

Ease of operation

No backpressure issues

Built in state management

Page 18: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Fraud Rule Engine

Track count # of sign_ups categorized by device_imei

Ban fraudsters in

real-time

Event processing - Apache Samza

KAFKA

Page 19: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Case III - OLAP (OnLine Analytical Processing)

A / B Tests

See progress of tests in real-time

Page 20: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Case III - OLAP use case

FORECASTING

“How many first time riders will be dropped off

in a given geofence ?”

Page 21: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Our integrated platform

● Filter events● Merge streams● Decorate with external data

Page 22: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Are we there yet ?

Artemis

Apollo

Event Processing

OLAP

?

2014 2015 2016 2017

SystemComplexity

Page 23: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

What’s missing ?

● Cumbersome for data scientists / Ops people● Redundant code● Custom backfill pipelines

Page 24: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

SQL as the building block

Page 25: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

SQL + Stream Processing

70-80% of jobs can be implemented via SQL

Page 26: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

Intelligent Promotions

SQL + Stream Processing: Powerful abstraction

Rule Threshold Action

“All trips worth > 10$ in San Francisco between Friday 5 pm and Sunday 9 pm

> 100 “Give bonus of $500”

Page 27: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Complicated rules

● “If number of hours online > 10 …”● “If amount earned > 700 in a given week, then …”● “If # uberPOOL rides >10, then …”● “If trip happens over some geo-fence 10 times in a given weekend, then …”

SQL + Stream Processing: Powerful abstraction

Page 28: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Intelligent Promotions

SQL + Stream Processing: Powerful abstraction

> 100 trigger_payment()select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600

Rule Threshold Action

Page 29: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Complicated rules

● “If number of hours online > 10 …”● “If amount earned > 700 in a given week, then …”● “If # Uber Pool rides >10, then …”● “If trip happens over some geo-fence 10 times in a given weekend, then …”

What if we created specific rules for specific driver partners ?

SQL + Stream Processing: Powerful abstraction

Page 30: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Can be used for alerts as well:

“If a driver X is outside a geofence, then …”

SQL + Stream Processing: Powerful abstraction

Page 31: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

New eco-system: Athena X

Page 32: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Enter Flink

Apache Calcite (SQL) Integration

Easy to manage and scale

No backpressure problem

Built in state management support

HDFS integration

Not dependent on Kafka

Page 33: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Rule Store

KAFKA Database

select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600

Promotions using Flink

Page 34: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Promotions using Flink

Rule Store

DatabaseKAFKA

select count(*) from hp_api_created_trips WHERE city_id = 1 AND fare > 10 AND request_at > 1491105600 AND request_at <= 1491177600

Page 35: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

New Eco-system: Athena X

HDFS

Kafka

Alerts

Kafka Streams

Other data destinations

Database Streams

HDFS Cassandra

Rule Store

HTTP

Page 36: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Are we there yet ?

Artemis

Apollo

Event Processing

OLAP

?

2014 2015 2016 2017

SystemComplexity Flink

Page 37: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

The road ahead ...

Page 38: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

U B E R | Data

Future Discussions

● To (Apache) Beam or not to Beam?● Real-time Machine Learning● Auto scaling

Page 39: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

AthenaX - Flink deep diveHaohui Mai

Bill Liu

(11:45 am)

Page 40: Flink Forward SF 2017: Chinmay Soman - Real Time Analytics in the real World – Challenges and Lessons at Uber

Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.

Thank you

For more: eng.uber.comTwitter: @UberEng