data modeling @uber - world wide web consortium · data modeling @uber joshua shinavier, phd...

Data Modeling @UberJoshua Shinavier, PhD

Presentation for the W3C Automotive Working Group

May 21st, 2019

● 200k managed data sets● 10+ billion trips

○ Low thousands of new entities per second● Even more sensor data

○ Use cases for graph stream processing● On-demand, streaming, RPC

○ Each dataset, stream, and service has its own schema

Data and schemas @Uber

• Data sources are not composable• Strong identifiers, weak semantics

• Duplicate types, homonyms, synonyms• Per-language data islands• Diversity of data modeling conventions

Schema integration: challenges

● Controlled vocabularies for all of Uber○ Basic type aliases (URL, UUID, timestamps, etc.)○ Structured types (sensor events, currency values, etc.)○ Entities and relationships (User, Vehicle, Trip, etc.)○ Metadata vocabularies

● Shared logical data model● Basic domain vocabularies

○ E.g. time, geometry and geolocation, addresses and contact info, sensors, money, etc.

● Tooling carries schemas between data representation languages○ Protobuf, Thrift, Avro, RDF, PG, etc.○ Schema and data transformations are composable

Data Standardization

● Need metadata for each dataset at Uber● Data protections and user trust

○ GDPR and other regulations, Uber’s own data policies○ What kind of user data? Where is it?○ Heroic numbers of manual annotations

■ Limited expressivity, limited guarantees■ Inference is required

● In annotating datasets, standardize and compose schemas

Metadata graph

● Common data model for RPC, storage, and KR at Uber● In progress: alignment with the Property Graph Schema Working Group● In progress: “Universal structure” of TinkerPop4

Algebraic Property Graphs

Transformations

Logical

Protocol Buffers

Apache Thrift

Apache Avro

Turtle (OWL)

joshsh@uber.com

Thanks

data modeling @uber - world wide web consortium · data modeling @uber joshua shinavier, phd...

Documents

parcours dsi & data experts€¦ · uber creating an...

case study of uber data in the central london area

tugas matakuliah big data analisis data perjalanan uber

ml and data science at uber - gitpro talk 2017

uber real time data analytics

sun - flash data retention rber uber

big challenges in data modeling: nosql and data modeling

big challenges in data modeling: ethical data modeling

uber japan - uber-static.s3.amazonaws.com · uber japan...

idiots guide to real uber efficient data centers

webinar vastgoed: big data uber alles november 2014

uber & big data a case study. uber - big data... · 2020....

modeling data

chapter 6 modeling the data: conceptual and logical data...

extensible entity data modeling @uber · enable the same...

uber movement - worldsecuresystems.com... · uber movement...

uber technologies, inc. - deal point data

agile data warehouse modeling: introduction to data vault...

data modeling online training | data modeling training | ...

uber real time data analytics bansal.pdf · sr. software...