scaling machine learning as a service at uber — li erran li at #papis2016

Use Case: UberEATS ETD Prediction

5

●○

●

○

●

○

●

○○

●○

HADOOP / YARN (Batch)


Hive Feature Store

NETWORK (Realtime)


Cassandra Feature Store

Hive Feature Store

Rea

l-tim

e pr

edic

tion

Trai

ning

Use Case: UberEATS ETD ML Pipeline

Hive

11

Feature store

Model Training

ModelUberEATS App

Model Performance

ETD

Problems• Hard to figure out good features

• Hard to build the pipelines to generate features

• Can’t compute some features in real time

Solution: DSL and Feature Store● Database of curated and crowd-sourced features

● Make it easy to use and transform these features in ML projects

● Make it easy to discover new useful features

● Batch and realtime serving

Data Pipeline For Predictions

Feature DSL

Transformed Features

Basis Features ML Model PredictionsData Lake Spark or

SQL

Data Pipeline For Predictions w/ Feature Palette

Feature Store

Feature DSL

Transformed Features

Basis Features ML Model PredictionsData Lake Spark or

SQL

Use Case: UberEATS ETD Model Details

15

Feature store

Model: GBT RegressionUberEATS

AppETD

● restaurant features○ location, avg prep-time, avg delivery time,

avg demand during lunch ...● contextual features

○ time of day, day of week, ...● order features

○ #items, total cost, ...● near real-time features

○ info about the past N orders, ...● ...

● Feature store provides aggregate features for real-time prediction

○ These features are time-consuming to compute in real-time

Problem● Often you want to train a model per city

● But hard to train and manage 400+ models for a project

Solution ● Let users define partitioning scheme

● Automatically train model per partition

● Manage and deploy as single logical model

1. Define partition scheme

2. Make train / test split

3. Keep same split for each level

M

M

M M M

M

M M M

4. Train model for every node

M

M

M M

M

M M M

5. Prune bad models

M

M

M M

M

M M M

6. At prediction time, use best model for each node

Use Case: UberEATS ETD Prediction Performance

24

● Partitioned GBDT Regression Model

● Latency (measured from client)

○ p50: 7ms

○ p95: 15ms

○ p99: 20ms

Conclusion● We present a scalable ML as a service system

● We focus on the scalability challenges and solutions

○ Feature store key to enable aggregate features for real-time prediction

■ Same API to access feature store for both batch training and real-time prediction

○ Partitioned models greatly simplifies model management and selection

■ Per city model performance often worse than global model

○ Scalable low latency real-time prediction service enables interactive user experiences

■ Load balancing across containers without global state

■ Fast one button deployment

■ Hot swap model upgrade

scaling machine learning as a service at uber — li erran li at #papis2016

Technology