mazda trio meeting

25
Trio: A System for Data, Uncertainty, and Lineage Search “stanford trio” http://i.stanford.edu/trio

Post on 13-Sep-2014

653 views

Category:

Automotive


3 download

DESCRIPTION

Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.com

TRANSCRIPT

Page 1: Mazda Trio Meeting

Trio: A System for Data, Uncertainty, and Lineage

Search “stanford trio”http://i.stanford.edu/trio

Page 2: Mazda Trio Meeting

2

People

Current• Jennifer Widom (faculty)• Omar Benjelloun (post-doc)• Parag Agrawal, Anish Das Sarma, Shubha Nabar (PhD)• Michi Mutsuzaki (MS)• Tomoe Sugihara (visitor)

Incoming• Martin Theobald (post-doc)• Raghu Murthy (MS)• Ander de Keijzer (visitor)

Alums• Alon Halevy, Ashok Chandra (visitors)• Chris Hayworth (MS)

Page 3: Mazda Trio Meeting

3

Why Uncertainty + Lineage?

Many applications seem to need bothFrom a technical standpoint, it turns out

that lineage...

1. Enables simple and consistent representation of uncertain data

2. Correlates uncertainty in query results with uncertainty in the input data

3. Can make computation over uncertain data more efficient

Page 4: Mazda Trio Meeting

4

Trio Components

1. Data Model ULDBs (Uncertainty-Lineage Databases): Simple extension to relational model

2. Query Language TriQL: Simple extension to SQL, well-defined

semantics and intuitive behavior

3. System Version 1: Complete system and GUI built

on top of conventional DBMS

Page 5: Mazda Trio Meeting

5

Running Example: Crime-Solving

Saw(witness,car) // may be uncertainDrives(person,car) // may be uncertain

Suspects(person) = πperson(Saw ⋈ Drives)

Page 6: Mazda Trio Meeting

6

Our Model for Uncertainty

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences

Page 7: Mazda Trio Meeting

7

Our Model for Uncertainty

1. Alternatives: uncertainty about value2. ‘?’ (Maybe) Annotations3. Confidences

Saw (witness,car)(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy,

Mazda)

witness carAmy { Honda, Toyota,

Mazda }=

Three possibleinstances

Page 8: Mazda Trio Meeting

8

Six possibleinstances

Our Model for Uncertainty

1. Alternatives2. ‘?’ (Maybe): uncertainty about presence3. Confidences

Saw (witness,car)(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy,

Mazda)(Betty, Acura)

?

Page 9: Mazda Trio Meeting

9

Our Model for Uncertainty

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences: weighted uncertainty

Saw (witness,car)(Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy,

Mazda): 0.2(Betty, Acura): 0.6

?

Six possible instances, each with a probability

Page 10: Mazda Trio Meeting

10

Models for Uncertainty

• Our model (so far) is not especially new• We spent some time exploring the space of

models for uncertainty [ICDE 06, journal]

• Tension between understandability and expressiveness– Our model is understandable– But it is not complete, or even closed under

common operations

Page 11: Mazda Trio Meeting

11

Our Model is Not Closed

Saw (witness,car)(Cathy, Honda) ∥ (Cathy,

Mazda)

Drives (person,car)(Jimmy, Toyota) ∥ (Jimmy,

Mazda)(Billy, Honda) ∥ (Frank, Honda)

(Hank, Honda)

SuspectsJimmy

Billy ∥ FrankHank

Suspects = πperson(Saw ⋈ Drives)

???

Does not correctlycapture possibleinstances in theresult

CANNOT

Page 12: Mazda Trio Meeting

12

Lineage to the Rescue

Lineage• Captures “where data came from”• In Trio: A function λ from alternatives to other

alternatives (or external sources)

Page 13: Mazda Trio Meeting

13

Example with Lineage

ID Saw (witness,car)11

(Cathy, Honda) ∥ (Cathy, Mazda)

ID Drives (person,car)21

(Jimmy, Toyota) ∥ (Jimmy, Mazda)

22

(Billy, Honda) ∥ (Frank, Honda)

23

(Hank, Honda)

ID Suspects31

Jimmy

32

Billy ∥ Frank

33

Hank

???

Suspects = πperson(Saw ⋈ Drives) λ(31) = (11,2),(21,2)

λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)λ(33) = (11,1), 23

Correctly captures possible instances inthe result

Page 14: Mazda Trio Meeting

14

Uncertainty-Lineage Databases (ULDBs)

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences4. Lineage

ULDBs are closed and complete[VLDB 06]

Page 15: Mazda Trio Meeting

15

ULDBs: Lineage

• Conjunctive lineage sufficient for most operations

• Duplicate-elimination: Disjunctive lineage • Difference: Negative lineage• General case after multiple

operations/queries: Boolean formula

Page 16: Mazda Trio Meeting

16

ULDBs: Interesting Questions

• Data-minimality: extraneous alternatives, extraneous “?”

• Lineage-minimality: harder• Membership: tuple and table, some-

instance and all-instances

• Coexistence: multiple tuples• Extraction: remove tables, retain

possible-instances

Page 17: Mazda Trio Meeting

17

Example: Extraneous Data

(Diane, Mazda) ∥ (Diane, Acura)

Dianeextraneous

(Diane, Mazda)

(Diane, Acura)

?

??

Page 18: Mazda Trio Meeting

18

Example: Coexistence

MazdaAcura

(Diane, Mazda) ∥ (Diane, Acura)

(Diane, Mazda)

(Diane, Acura)

?

??

?Can’t coexist

Page 19: Mazda Trio Meeting

19

Querying ULDBs: Semantics

Query Q on ULDB D

D

D1, D2, …, Dn

possibleinstances

Q on eachinstance

representationof instances

Q(D1), Q(D2), …, Q(Dn)

D’implementation of Q

operational semanticsD + Result

Page 20: Mazda Trio Meeting

20

Querying ULDBs: TriQL

Basic TriQL: SQL with new semantics• Obeys commutative diagram for uncertain data• Tracks lineage• Query results: new table or on-the-fly

Implemented TriQL: also built-in predicates conf(), lineage(), lineage*()

Page 21: Mazda Trio Meeting

21

Additional TriQL Constructs[Language manual on web site]

• “Horizontal subqueries”Refer to tuple alternatives as a relation

• Unmerged (horizontal duplicates)• Flatten, GroupAlts

• NoLineage, NoConf, NoMaybe• Query-specified confidences [done]• Data modification statements

Page 22: Mazda Trio Meeting

22

Confidence Computation

• Confidences computed on-demand based on lineage—Confidence of alternative A is function of

confidences in λ*(A)—Permits any query plan for data computation

• Default probabilistic interpretation, but queries can override

SELECT person, min(conf(Saw),conf(Drives)) as confFROM Saw, DrivesWHERE Saw.car = Drives.car

Page 23: Mazda Trio Meeting

23

Trio System: Version 1

Standard relational DBMS

Trio API and translator(Python)

Command-lineclient

TrioMetadat

a

TrioExplorer(GUI client)

Trio Stored

Procedures

EncodedData

TablesLineageTables

Standard SQL• “Verticalize”• Shared IDs for alternatives• Columns for confidence,“?”• One per result table• Uses unique IDs

• Table types• Schema-level lineage structure• conf()• lineage() “==>”• lineage*() “==>>”

• DDL commands• TriQL queries• Schema browsing• Table browsing• Explore lineage• On-demand confidence computation

Page 24: Mazda Trio Meeting

24

Current & Future Topics

Algorithms: confidence computation, coexistence

extraneous data• Minimize lineage traversal• Memoization• Batch operations

System• Full query language• More internal processing ?

– Storage and indexing– Statistics and query optimization

Page 25: Mazda Trio Meeting

25

Current & Future Topics

• Top-K by confidence • Extend basic uncertainty model

—Incomplete relations—Continuous uncertainty—Correlated uncertainty ?

• External lineage, update lineage, versioning