live a lineage-supported, versioned dbms

27
LIVE A lineage-supported, versioned DBMS Anish Das Sarma Martin Theobald Jennifer Widom

Upload: solana

Post on 07-Jan-2016

47 views

Category:

Documents


0 download

DESCRIPTION

LIVE A lineage-supported, versioned DBMS. Anish Das Sarma Martin Theobald Jennifer Widom. Agenda. ULDB Data Model and the Trio System Uncertainty & Lineage LIVE Data Model (LDM) Uncertainty, Lineage & Versioning Data Modifications Insert/Delete Tuples, Update Values, Update Confidences - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: LIVE  A lineage-supported, versioned DBMS

LIVE

A lineage-supported, versioned DBMS

Anish Das Sarma Martin Theobald Jennifer Widom

Page 2: LIVE  A lineage-supported, versioned DBMS

ULDB Data Model and the Trio System Uncertainty & Lineage

LIVE Data Model (LDM) Uncertainty, Lineage & Versioning

Data Modifications Insert/Delete Tuples, Update Values, Update

Confidences Query Evaluation

Valid-At vs. Snapshot Queries, Interval Computations, Confidence Computations, Complexity

Experiments/Conclusions

Agenda

20.04.232 LIVE - A lineage-supported, versioned DBMS

Page 3: LIVE  A lineage-supported, versioned DBMS

ULDB Data Model

20.04.233 LIVE - A lineage-supported, versioned DBMS

Different types of uncertainty: 1. Tuple Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences

Implementation of the ULDB data model: Trio System

TriQL query language TrioExplorer browser frontend, trioplus client,

API Enhanced PostgreSQL backend (SPI) Search for “Stanford Trio”

Page 4: LIVE  A lineage-supported, versioned DBMS

ULDBs – Alternatives

20.04.234 LIVE - A lineage-supported, versioned DBMS

1. Alternatives: uncertainty about attribute values

2. ‘?’ (Maybe) Annotations 3. Confidences

Saw (witness, color, car)

Amy red, Honda ∥ red, Toyota ∥ orange, Mazda

Three possibleworlds

Page 5: LIVE  A lineage-supported, versioned DBMS

ULDBs – Maybe Annotations

20.04.235 LIVE - A lineage-supported, versioned DBMS

Six possibleworlds

1. Alternatives 2. ‘?’ (Maybe): uncertainty about tuple

presence 3. Confidences

?

Saw (witness, color, car)

Amy red, Honda ∥ red, Toyota ∥ orange, Mazda

Betty blue, Acura

Page 6: LIVE  A lineage-supported, versioned DBMS

ULDBs – Confidences

20.04.236 LIVE - A lineage-supported, versioned DBMS

1. Alternatives 2. ‘?’ (Maybe) Annotations 3. Confidences: weighted uncertainty

Six possible worlds,each with a probability

?

Saw (witness, color, car)

Amy red, Honda 0.5 ∥ red, Toyota 0.3 ∥ orange, Mazda 0.2

Betty blue, Acura 0.6

Page 7: LIVE  A lineage-supported, versioned DBMS

ULDBs – Closure

20.04.237 LIVE - A lineage-supported, versioned DBMS

Saw (witness, car)

Cathy

Mazda ∥ Honda

Drives (person, car)

Jimmy, Toyota ∥ Jimmy, Mazda

Billy, Honda ∥ Frank, Honda

Hank, Honda

Suspects

Jimmy

Billy ∥ Frank

Hank

Suspects = πperson(Saw ⋈ Drives)

???

Does not correctlycapture possibleworlds in theresult!

CANNOT

Page 8: LIVE  A lineage-supported, versioned DBMS

ULDBs – Lineage

20.04.238 LIVE - A lineage-supported, versioned DBMS

ID Saw (witness, car)

11

Cathy

Honda ∥ Mazda

ID Drives (person, car)

21

Jimmy, Toyota ∥ Jimmy, Mazda

22

Billy, Honda ∥ Frank, Honda

23

Hank, Honda

ID Suspects

31

Jimmy

32

Billy ∥ Frank

33

Hank

Suspects = πperson(Saw ⋈ Drives)

???

λ(31) = (11,2)(21,2)

λ(32,1) = (11,1)(22,1)

λ(33) = (11,1)23

; λ(32,2) = (11,1)(22,2)

Page 9: LIVE  A lineage-supported, versioned DBMS

ULDBs – Summary

20.04.239 LIVE - A lineage-supported, versioned DBMS

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences4. Lineage

ULDBs are closed and complete

Uncertainty-Lineage Databases (ULDBs)Uncertainty-Lineage Databases (ULDBs)

Page 10: LIVE  A lineage-supported, versioned DBMS

Can exclusively utilize lineage in order to compute the confidence of a result tuple.

#P-complete for general Boolean formulas Approximation algorithms: Luby-Karp, etc.

Lineage & Confidences

20.04.2310 LIVE - A lineage-supported, versioned DBMS

λ(21) = (11 12 13)

ID Saw(witness, car)

11 (Mary, Honda) : 0.8

12 (Susan, Honda) : 0.9

13 (Betty, Honda) : 0.5

ID SuspectCars(car)

21 Honda : ?

Select distinct car from Saw;

P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5)

0.99

Page 11: LIVE  A lineage-supported, versioned DBMS

ID Photo(Number,Name)2

11 (1, Amy) [0,1] : 1.0

12 (1, Bob) [0,] : 0.6

13 (2, Carl) [0,1] : 0.314 (3, Dale) [1,1] : 0.1

Versioning (LDM Data Model)

20.04.2311 LIVE - A lineage-supported, versioned DBMS

Version intervals for tuples Contiguous version numbers 0,…, Database has current version vD

Tuples have a validity intervals [s, e]

Valid-At Queries: Select * from Photo valid-at 2;

Snapshot Queries: View Photo at 2;

Possible Worlds: LDM databases encode lists of sets of

possible worlds.

ID Photo(Number,Name)2

12 (1, Bob) [0,] : 0.6

ID Photo@2(Number,Name)

12 (1, Bob) : 0.6

Page 12: LIVE  A lineage-supported, versioned DBMS

Insert Tuple: Insert t with version [vD+1,]

commit; Increase vD

Data Modifications – Insert

20.04.2312 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)0

21 (Bob, NY, Analyst) [0,] : 1.0

22 (Carl, IL, Teacher) [0,] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.3

ID People(Name, State, Job)1

ID People(Name, State, Job)2

25 (David, PA, CEO) [2,] : 0.3

(1)

(2)

(2)

Page 13: LIVE  A lineage-supported, versioned DBMS

Insert Tuple: Insert t with version [vD+1,]

Delete Tuple: Set end(t) to vD

commit; Increase vD

Data Modifications – Delete

20.04.2313 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)2

21 (Bob, NY, Analyst) [0,] : 1.0

22 (Carl, IL, Teacher) [0,] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.3

22 (Carl, IL, Teacher) [0,2] : 1.0

ID People(Name, State, Job)3

(1)

(2)

(3)

(2)

Page 14: LIVE  A lineage-supported, versioned DBMS

Insert Tuple: Insert t with version [vD+1,]

Delete Tuple: Set end(t) to vD

Update Value: Set end(t) to vD

Insert t’ with version [vD+1,]

commit; Increase vD

Data Modifications – Update

20.04.2314 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)3

21 (Bob, NY, Analyst) [0,] : 1.0

22 (Carl, IL, Teacher) [0,2] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.321 (Bob, CA, Student) [4,] : 0.3

21 (Bob, NY, Analyst) [0,3] : 1.0

(1)

(2)

(3)

(2)

(4)

(4)

ID People(Name, State, Job)4

Page 15: LIVE  A lineage-supported, versioned DBMS

Insert Tuple: Insert t with version [vD+1,]

Delete Tuple: Set end(t) to vD

Update Value: Set end(t) to vD

Insert t’ with version [vD+1,]

Update Probability: Set end(t) to vD

Insert t’=t with probability p’ and version [vD+1,]

commit; Increase vD

Data Modifications – Update

20.04.2315 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)4

21 (Bob, NY, Analyst) [0,3] : 1.0

22 (Carl, IL, Teacher) [0,2] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.321 (Bob, CA, Student) [4,] : 0.3

(1)

(2)

(3)

(2)

(4)

(4)21 (Bob, CA, Student) [5,] :

0.7

21 (Bob, CA, Student) [4,4] : 0.3 (5)

ID People(Name, State, Job)5

Page 16: LIVE  A lineage-supported, versioned DBMS

Insert Tuple: Insert t with version [vD+1,]

Delete Tuple: Set end(t) to vD

Update Value: Set end(t) to vD Insert t’ with version [vD+1,]

Update Probability: Set end(t) to vD Insert t’=t with probability p’ and version

[vD+1,]

Possible worlds: Updates may create duplicate

worlds, which are merged (at any version v).

Data Modifications – Summary

20.04.2316 LIVE - A lineage-supported, versioned DBMS

ID People(Name, State, Job)4

21 (Bob, NY, Analyst) [0,3] : 1.0

22 (Carl, IL, Teacher) [0,2] : 1.0

23 (David, PA, Manager)

[0,] : 0.6

24 (Frank, CA, Eng.) [1,] : 0.325 (David, PA, CEO) [2,] : 0.326 (Bob, CA, Student) [4,] : 0.3

(1)

(2)

(3)

(2)

(4)

(4)21 (Bob, CA, Student) [5,] :

0.7

21 (Bob, CA, Student) [4,4] : 0.3 (5)

ID People(Name, State, Job)5

Page 17: LIVE  A lineage-supported, versioned DBMS

1) Data Computation (regular SQL, including lineage) 2) Interval Computation (stored procedure)

Query Evaluation

20.04.2317 LIVE - A lineage-supported, versioned DBMS

DD

D1, D2, …, Dn1D1, D2, …, Dn1

possibleworlds

at versionsQ on each

world

encodingof possible worlds

Q(D1), Q(D2), …, Q(Dn)Q(D1), Q(D2), …, Q(Dn)

implementation of Q

operational semantics

D + ResultD + Result

D1, D2, …, Dn2D1, D2, …, Dn2

@ (0)

@ (1)

D1, D2, …, DnvD1, D2, …, Dnv @ (vD)

@ (0)

Page 18: LIVE  A lineage-supported, versioned DBMS

Can exclusively utilize lineage in order to compute the confidence of any result tuple.

Can exclusively utilize lineage in order to compute the version interval of any result tuple.

Lineage, Confidences & Versions

20.04.2318 LIVE - A lineage-supported, versioned DBMS

Page 19: LIVE  A lineage-supported, versioned DBMS

Positive Lineage (disjunctions & conjunctions) In the lineage formula λ(t)

Replace every tuple t’ by its version interval Replace every with and every with

Version Interval Computation

20.04.2319 LIVE - A lineage-supported, versioned DBMS

λ(21) = (11 12 13)

ID Saw(witness, car)3

11 (Mary, Honda) [1,] : 0.8

12 (Susan, Honda) [2,] : 0.9

13 (Betty, Honda) [3,] : 0.5

ID SuspectCars(car)3

21 (Honda) ? : ?

Select distinct car from Saw;

P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5)

[1,] :

0.99

Page 20: LIVE  A lineage-supported, versioned DBMS

Positive Lineage (disjunctions & conjunctions) In the lineage formula λ(t)

Replace every tuple t’ by its version interval Replace every with and every with

Version & Confidence Computation

20.04.2320 LIVE - A lineage-supported, versioned DBMS

λ(21) = (11 12)

ID Saw(witness, car)3

11 (Mary, Honda) [1,] : 0.8

12 (Susan, Honda) [2,] : 0.9

13 (Betty, Honda) [3,] : 0.5

ID SuspectCars(car)3

21 (Honda) [1,] : 0.99

Select distinct car from Saw;

P(21) = 1 – (1-0.8) X (1-0.9)

ID SuspectCars(car)2

21 (Honda) ? : ?

Select distinct car from Saw valid-at 2;

[1,] : 0.98

Page 21: LIVE  A lineage-supported, versioned DBMS

20.04.2321 LIVE - A lineage-supported,

versioned DBMS

Can decouple interval computation from data computation

Or: push interval computation into query plans only when there is no negation.

Interval Computations & Query Plans

Select R.A from R EXCEPT ( Select R.A from R EXCEPT Select S.A from S ); r=(a)[0,10] u=(a)[0,10]

t=(a)[0,10]

r=(a)[0,10] s=(a)[5,15]

Select R.A from R,SWhere R.A=S.A;

r=(a)[0,10] s=(a)[5,15]

t=(a)[5,10]

Page 22: LIVE  A lineage-supported, versioned DBMS

Positive Lineage (disjunctions & conjunctions) Version interval computation

PTIME (linear) Confidence computation

#P-complete

Arbitrary Lineage (including negation) Version interval computation

PTIME (linear) if all confidences are known NP-hard if confidences are not known

(need to check for idempotence of negated tuples) Confidence computation

#P-complete

Complexity Results

20.04.2322 LIVE - A lineage-supported, versioned DBMS

Page 23: LIVE  A lineage-supported, versioned DBMS

Probabilistic & versioned TPC-H setting Queries over Lineitem, Orders tables

with varying join selectivity from 0.1% to 1% (6,000-60,000 and1,500-15,000 tuples for Lineitem & Orders)

Update 0.1% to 1% of the input data Assign probabilities within [0,1] uniform-randomly to

tuples

Additional indexes for versioning Two B+-trees on (start, end) and end points of intervals Rewrite valid-at & snapshot queries using

WHERE (start ≤ v ≤ end) predicates

Experiments – Setup

20.04.2323 LIVE - A lineage-supported, versioned DBMS

Page 24: LIVE  A lineage-supported, versioned DBMS

Experiments – Results (I)

20.04.2324 LIVE - A lineage-supported, versioned DBMS

Join query Overhead of versioned

system vs. non-versioned system (versions not computed)

Join query Overhead of

computing versions (versioned system)

(%)

Page 25: LIVE  A lineage-supported, versioned DBMS

Experiments – Results (II)

20.04.2325 LIVE - A lineage-supported, versioned DBMS

Join query Progressive data

updates (overwrite multiple times)

Join query Valid-at queries vs. full version

computation

Page 26: LIVE  A lineage-supported, versioned DBMS

Experiments – Results (III)

20.04.2326 LIVE - A lineage-supported, versioned DBMS

Overhead of version computation, different query types (1% data modified)

Page 27: LIVE  A lineage-supported, versioned DBMS

LDMs are closed and complete Generalizes to full ULDB data

model (including value alternatives & maybe (?) annotations)

Can employ lineage also for update propagations Supports all of

INSERT/DELETE/UPDATE with INTERSECT/UNION/EXCEPT set operations

Conclusions

20.04.2327 LIVE - A lineage-supported, versioned DBMS

Lineage

Uncertainty Versioning

DBMS