mazda trio meeting
Post on 13-Sep-2014
653 views
DESCRIPTION
Experience Mazda Zoom Zoom Lifestyle and Culture by Visiting and joining the Official Mazda Community at http://www.MazdaCommunity.org for additional insight into the Zoom Zoom Lifestyle and special offers for Mazda Community Members. If you live in Arizona, check out CardinaleWay Mazda's eCommerce website at http://www.Cardinale-Way-Mazda.comTRANSCRIPT
Trio: A System for Data, Uncertainty, and Lineage
Search “stanford trio”http://i.stanford.edu/trio
2
People
Current• Jennifer Widom (faculty)• Omar Benjelloun (post-doc)• Parag Agrawal, Anish Das Sarma, Shubha Nabar (PhD)• Michi Mutsuzaki (MS)• Tomoe Sugihara (visitor)
Incoming• Martin Theobald (post-doc)• Raghu Murthy (MS)• Ander de Keijzer (visitor)
Alums• Alon Halevy, Ashok Chandra (visitors)• Chris Hayworth (MS)
3
Why Uncertainty + Lineage?
Many applications seem to need bothFrom a technical standpoint, it turns out
that lineage...
1. Enables simple and consistent representation of uncertain data
2. Correlates uncertainty in query results with uncertainty in the input data
3. Can make computation over uncertain data more efficient
4
Trio Components
1. Data Model ULDBs (Uncertainty-Lineage Databases): Simple extension to relational model
2. Query Language TriQL: Simple extension to SQL, well-defined
semantics and intuitive behavior
3. System Version 1: Complete system and GUI built
on top of conventional DBMS
5
Running Example: Crime-Solving
Saw(witness,car) // may be uncertainDrives(person,car) // may be uncertain
Suspects(person) = πperson(Saw ⋈ Drives)
6
Our Model for Uncertainty
1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences
7
Our Model for Uncertainty
1. Alternatives: uncertainty about value2. ‘?’ (Maybe) Annotations3. Confidences
Saw (witness,car)(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy,
Mazda)
witness carAmy { Honda, Toyota,
Mazda }=
Three possibleinstances
8
Six possibleinstances
Our Model for Uncertainty
1. Alternatives2. ‘?’ (Maybe): uncertainty about presence3. Confidences
Saw (witness,car)(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy,
Mazda)(Betty, Acura)
?
9
Our Model for Uncertainty
1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences: weighted uncertainty
Saw (witness,car)(Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy,
Mazda): 0.2(Betty, Acura): 0.6
?
Six possible instances, each with a probability
10
Models for Uncertainty
• Our model (so far) is not especially new• We spent some time exploring the space of
models for uncertainty [ICDE 06, journal]
• Tension between understandability and expressiveness– Our model is understandable– But it is not complete, or even closed under
common operations
11
Our Model is Not Closed
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jimmy, Toyota) ∥ (Jimmy,
Mazda)(Billy, Honda) ∥ (Frank, Honda)
(Hank, Honda)
SuspectsJimmy
Billy ∥ FrankHank
Suspects = πperson(Saw ⋈ Drives)
???
Does not correctlycapture possibleinstances in theresult
CANNOT
12
Lineage to the Rescue
Lineage• Captures “where data came from”• In Trio: A function λ from alternatives to other
alternatives (or external sources)
13
Example with Lineage
ID Saw (witness,car)11
(Cathy, Honda) ∥ (Cathy, Mazda)
ID Drives (person,car)21
(Jimmy, Toyota) ∥ (Jimmy, Mazda)
22
(Billy, Honda) ∥ (Frank, Honda)
23
(Hank, Honda)
ID Suspects31
Jimmy
32
Billy ∥ Frank
33
Hank
???
Suspects = πperson(Saw ⋈ Drives) λ(31) = (11,2),(21,2)
λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)λ(33) = (11,1), 23
Correctly captures possible instances inthe result
14
Uncertainty-Lineage Databases (ULDBs)
1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences4. Lineage
ULDBs are closed and complete[VLDB 06]
15
ULDBs: Lineage
• Conjunctive lineage sufficient for most operations
• Duplicate-elimination: Disjunctive lineage • Difference: Negative lineage• General case after multiple
operations/queries: Boolean formula
16
ULDBs: Interesting Questions
• Data-minimality: extraneous alternatives, extraneous “?”
• Lineage-minimality: harder• Membership: tuple and table, some-
instance and all-instances
• Coexistence: multiple tuples• Extraction: remove tables, retain
possible-instances
17
Example: Extraneous Data
(Diane, Mazda) ∥ (Diane, Acura)
Dianeextraneous
(Diane, Mazda)
(Diane, Acura)
?
??
18
Example: Coexistence
MazdaAcura
(Diane, Mazda) ∥ (Diane, Acura)
(Diane, Mazda)
(Diane, Acura)
?
??
?Can’t coexist
19
Querying ULDBs: Semantics
Query Q on ULDB D
D
D1, D2, …, Dn
possibleinstances
Q on eachinstance
representationof instances
Q(D1), Q(D2), …, Q(Dn)
D’implementation of Q
operational semanticsD + Result
20
Querying ULDBs: TriQL
Basic TriQL: SQL with new semantics• Obeys commutative diagram for uncertain data• Tracks lineage• Query results: new table or on-the-fly
Implemented TriQL: also built-in predicates conf(), lineage(), lineage*()
21
Additional TriQL Constructs[Language manual on web site]
• “Horizontal subqueries”Refer to tuple alternatives as a relation
• Unmerged (horizontal duplicates)• Flatten, GroupAlts
• NoLineage, NoConf, NoMaybe• Query-specified confidences [done]• Data modification statements
22
Confidence Computation
• Confidences computed on-demand based on lineage—Confidence of alternative A is function of
confidences in λ*(A)—Permits any query plan for data computation
• Default probabilistic interpretation, but queries can override
SELECT person, min(conf(Saw),conf(Drives)) as confFROM Saw, DrivesWHERE Saw.car = Drives.car
23
Trio System: Version 1
Standard relational DBMS
Trio API and translator(Python)
Command-lineclient
TrioMetadat
a
TrioExplorer(GUI client)
Trio Stored
Procedures
EncodedData
TablesLineageTables
Standard SQL• “Verticalize”• Shared IDs for alternatives• Columns for confidence,“?”• One per result table• Uses unique IDs
• Table types• Schema-level lineage structure• conf()• lineage() “==>”• lineage*() “==>>”
• DDL commands• TriQL queries• Schema browsing• Table browsing• Explore lineage• On-demand confidence computation
24
Current & Future Topics
Algorithms: confidence computation, coexistence
extraneous data• Minimize lineage traversal• Memoization• Batch operations
System• Full query language• More internal processing ?
– Storage and indexing– Statistics and query optimization
25
Current & Future Topics
• Top-K by confidence • Extend basic uncertainty model
—Incomplete relations—Continuous uncertainty—Correlated uncertainty ?
• External lineage, update lineage, versioning