real-time analytics at uber scale

*

ApolloJames BurkhartUber - Staff Engineer

Agenda

- Motivation

- Ingest

- Storage

- Query

Motivation

- Business Intelligence- Real-time- Time series aggregates- Geospatial

What is Apollo?- Real-time analytics platform focused on:

- Recent data (~7 weeks)- Immediate visibility (1500ms-3minute p99 ingest latency)- Ad-hoc queryability

- Arbitrary drilldown- Geospatial functionality

- Data correctness/deduplication (exactly-once)- Extremely low latency query (<100ms p95, <1s p99)

- Powering internal data tools at Uber

Real-time operational analytics dashboarding

- Used by majority of Operations weekly

Apollo Query Builder

- Web UI for Apollo Query Language

- Fully interactive

NYE 2016-2017

Motivation, Functionality Requirements

- Index based on data timestamp, not arrival timestamp- Out of order and late (up to days later) arrival- Mutability

- Sub-linear performance impact of scaling QPS

Apollo architecture

Users

Environment Management(MemSQL Cluster Sizes)

Datacenter 1 Datacenter 2

Production Prime33x 256GB

Production Prime 243x 256GB

Production Minor5x 256GB

Production Minor 27x 256GB

Staging/Preprod25x 256GB

mirrored

Ingestion

Ingestion

● Simple transformations○ (i.e string uuid to binary representation)

■ “123e4567-e89b-12d3-a456-426655440000” >= 36B■ 0x123E4567E89B12D3A456426655440000 >= 16B

● Filters● Each job is one input stream to (>=1) output tables● Independent job instance per environment

val inputStream = KafkaInputStream(topic);

job.outputTables.forEach((outputTable) => {

inputStream

.filter( ... )

.map(..transformations -> sql row...)

.grouped(outputTable.batchSize)

.forEach(writeBatchToDatabase)

});

Ingestion

● Upserts - No double counting!● Async RF=2 MemSQL replication

○ Can lose recent writes during hardware failure● Solution -> every 6 hours, upsert last 72h worth of data in

batch from Hive

Storage

● In-memory rowstore - mutable/recent● Columnstore - immutable/older

Caching

● Partial, recomposable results● Sharded MySQLs

Apollo Query Language (AQL)

● Custom Analytical Time-Series Query Language● Goals:

○ Flexibility like SQL○ Minimal Learning Curve○ Ease-of-Use

● Features:○ Canonicalization○ Ease-of-parsing○ Error detection○ Automatic optimization

{ "table": "trips", "joins": [ { "alias": "g", "table": "geofences", "conditions": [ "geography_intersects(request_at, g.shape)" ] } ], "dimensions": [ { "sqlExpression": "request_at", "timeBucketizer": "day", "timeUnit": "millisecond" } ], "measures": [ { "sqlExpression": "count(*)", "rowFilters": [ "status='completed'" ] } ], "rowFilters": [ "city_id=1", "g.uuid=0x0A" ], "timeFilter": { "column": "request_at", "from": "yesterday", "to": "yesterday" }, "timezone": "America/Los_Angeles"}

Example

Apollo Query Builder

- Web UI for Apollo Query Language

- Fully interactive

Why SQL is hard for time series OLAP

Field Value

Dimension.SQLExpression request_at

Dimension.TimeBucketizer day

Dimension.TimeUnit millisecond

Timezone America/Los_Angeles

Why SQL is hard for time series OLAP● Date/time functions:

○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) % 900000) / 1000), 'GMT', 'America/Los_Angeles'), '%Y-%m-%d'), 'America/Los_Angeles', 'UTC')) / 0.001, 0)

○ Cheap timestamp snapping to 15m○ Conversion from milliseconds to seconds○ Conversion from Unix timestamp to SQL time○ Adding timezone to Unix time○ Date/time formatting/truncation○ Timezone conversion○ Conversion from SQL time to Unix timestamp○ Conversion from seconds to milliseconds

Field Value




Timezone America/Los_Angeles

Why SQL is hard for time series OLAP● City/Region/Country based timezone

○ ROUND(UNIX_TIMESTAMP(CONVERT_TZ(DATE_FORMAT(CONVERT_TZ(FROM_UNIXTIME(((trips.request_at) - (trips.request_at) % 900000) / 1000), 'GMT', __tz__.sub_region_timezone), '%Y-%m-%d'), __tz__.sub_region_timezone, 'UTC')) / 0.001, 0) FROM trips JOIN api_cities as __tz__ ON trips.city_id = __tz__.id

○ Join with api_cities (which has timezone info of each level) on city_id○ Use the corresponding timezone column from api_cities

Field Value




Timezone sub_region_timezone(city_id)

Why SQL is hard for time series OLAP● #completed_trips / #requested_trips

○ SUM(CASE WHEN trips.status=’completed’ THEN 1 ELSE 0 END) / SUM(CASE WHEN trips.status!=’ignored’ THEN 1 ELSE 0 END)

○ SELECT …, _1.completed / _2.requested FROM (SELECT …, COUNT(*) as completed FROM trips WHERE status=’completed’ GROUP BY ...) AS _1 JOIN (SELECT …, COUNT(*) as requested FROM trips WHERE status!=’ignored’ GROUP BY ...) AS _2 ON ...

○ Filters make measures complexField Value

Measure[0].SQLExpression count(*)

Measure[0].Filters status=’completed’

Measure[0].Alias completed

Measure[1].SQLExpression count(*)

Measure[1].Filters status!=’ignored’

Measure[1].Alias requested

Measure[2].SQLExpression completed / requested

Why SQL is hard for time series OLAP● #Trips by geofence for geofence A, B and C

○ SELECT count(*), geofences.uuid FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN (A, B, C) GROUP By geofences.uuid

● Total #Trips for geofence A, B and C○ SELECT count(*) FROM trips JOIN geofences ON geography_intersects(trips.request_point, geofences.shape) WHERE geofences.uuid IN

(A, B, C)

● Overlapping is OK, overcounting is not!○ SELECT count(*) FROM trips WHERE EXISTS (SELECT * FROM geofences WHERE geography_intersects(trips.request_point,

geofences.shape) AND geofences.uuid IN (A, B, C)

Bad SQL queries● SELECT count(*), request_at FROM trips GROUP BY request_at;

○ Time needs to be bucketized! Grouping by milliseconds makes no sense!

● SELECT count(*), fare_total FROM trips GROUP BY fare_total;

○ Some numeric values such as fare needs to be bucketized (reported as histograms)!

● SELECT sum(fare_total) FROM trips, other_table WHERE trips.fare_total>1.0 AND other_table.foo=’BAR’;

○ Join condition is missing, cartesian product is bad!

AQL Query OptimizationDate/time function performance issue

● CONCAT(DATE_FORMAT(FROM_UNIXTIME((__d0__) / 1000), '%Y-%m-%d '), LPAD(3 *

FLOOR(HOUR(FROM_UNIXTIME((__d0__) / 1000)) / 3), 2, '0'), ':00')

● Run for every row (trip)!

Two-stage aggregation

date/time function bucketizaton

request_at

count(*)

date/time function bucketizaton

request_at

count(*) as ct - t % 15m

sum(c) Stage 2

Stage 1

Time Series Bucket SplittingNow: 2016-03-22 13:17

2016-03-21 (partial week)

2016-03-21 (day) 2016-03-22 00:00 (hour)

2016-03-22 01:00 (hour)

...(hour)

2016-03-22 12:00 (hour)

2016-03-22 13:00 (15m)

2016-03-22 13:15 (minute)

2016-03-22 13:16 (minute)

2016-03-22 13:15 (15m)

Split Rollup

From: this week To: now

Time Series Bucket Splitting

2016-03-07 (week)

To: -12h

2016-03-14 (week) 2016-03-21 (partial week)

2016-03-02 (partial week)

From: -20d

2016-03-02 (day)

2016-03-03 (day)

... (day) 2016-03-06 (day)

2016-03-21 (day)

2016-03-22 00:00 (hour)

Now: 2016-03-22 13:17

2016-03-22 01:00 (hour)

Split Rollup Split Rollup

BucketSize: week

AQL Query OptimizationAggregate rollups

avg(x) = sum(x) / count(*)

Original function Stage 1 Stage 2 (rollup)

count count sum

sum sum sum

min min min

max max max

count distinct distinct count distinct

HyperLogLog

Contracts

SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h) (where city=x)

group by 15m(, city);

Contracts

SELECT AVG(fare), ts_15m FROM trips WHERE time >= (now() - 1h) (where city=x)

group by 15m(, city);

(where city=x) --p95--> 50ms 60ms 70msFor x in cities: (where city=x) -sum-> ~9s ~10s ~12sgroup by city --p95--> 200ms ~1s ~7s

1h 24h (21d, group by 24h)

Contracts

SELECT COUNT(1), AVG(fare), SUM(fare), AVG(eta) FROM trips WHERE ...

SELECT COUNT(1), AVG(fare), SUM(fare), SUM(eta) FROM trips WHERE ...

ContractsSELECT COUNT(1) FROM trips WHERE

City = ‘San Francisco’State = ’completed’Product = ’Uber-X’

(City,State,Product),(City,State),(City,Product),(City),(State),(State,Product),(Product),(∅)

Geographical Breakdowns:World > North America > United States > US West > California > BayArea > SF

ContractsSELECT COUNT(1) FROM trips WHERE GROUP BY

City = ‘San Francisco’State = ’completed’Product = ’Uber-X’

(City,State,Product),(City,State),(City,Product),(City),(State),(State,Product),(Product),(∅)

Geographical Breakdowns:World > North America > United States > US West > California > BayArea > SF

Stats

● p80 <= 10ms● p90 <= 50ms● p95 <= 100ms● p99 <= 1000ms● p99.5 <= 5000ms

● Millions queries/day● ~250k distinct queries● Billions MySQL writes/day

Future Plans (next 3-6 months)

● Product ○ Self-service onboarding and schema management○ Schema change management and automation

● Technology ○ Cost Accounting○ Contract automation○ Query cost estimation

Challenges and Learnings

Schema Challenges

● Many Schemas:○ Ingestion transformations

■ Hive■ Avro-encoded Kafka

○ MemSQL Schema○ Query layer schema

Ingestion

Ingestion

Metric Spark Golang

Containers 32 4

CPU Cores 160 8

Memory (GB) 226 16

Throughput 36k/s 60k/s

Performance differences for largest job

Questions?

(PS: We’re hiring)

Uber Engineering Blogeng.uber.com

Uber Open Sourceuber.github.io

Uber Eng Twittertwitter.com/ubereng

These slideshttps://tinyurl.com/apollostrata msql.co/uberscale

Check out ‘Hoodie: Incremental processing on Hadoop at Uber’ Thursday 1:50-2:30 for the next Uber Strata presentation.