managing uncertain data

68
Managing Uncertain Data Anish Das Sarma Stanford University May 11, 2022 1 Anish Das Sarma

Upload: kohana

Post on 12-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Managing Uncertain Data. Anish Das Sarma Stanford University. What is Uncertain Data?. Why Does It Arise?. Precision of devices. Lack of information. Uncertainty about the future. Anonymization. Applications: Information Extraction. Applications: Information Integration. name, hPhone, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Managing Uncertain Data

Managing Uncertain Data

Anish Das SarmaStanford University

April 21, 2023 1Anish Das Sarma

Page 2: Managing Uncertain Data

What is Uncertain Data?

April 21, 2023 2Anish Das Sarma

(Certain) Data Uncertain Data

Temperature is 74.634589 F Sensor reported 75 ±0.5 F

Bob works for Yahoo Bob works for either Yahoo or Microsoft

Mary sighted a Finch Mary sighted either a Finch (80%) or a Sparrow (20%)

It will rain in Stanford tomorrow

There is a 60% chance of rain in Stanford tomorrow

Yahoo stocks will be at 100 in a month

Yahoo stock will be between 60 and 120 in a month

John’s age is 23 John’s age is in [20,30]

Page 3: Managing Uncertain Data

Why Does It Arise?

April 21, 2023 3Anish Das Sarma

(Certain) Data Uncertain Data

Temperature is 74.634589 F Sensor reported 75 ±0.5 F

Bob works for Yahoo Bob works for either Yahoo or Microsoft

Mary sighted a Finch Mary sighted either a Finch (80%) or a Sparrow (20%)

It will rain in Stanford tomorrow

There is a 60% chance of rain in Stanford tomorrow

Yahoo stocks will be at 100 in a month

Yahoo stock will be between 60 and 120 in a month

John’s age is 23 John’s age is in [20,30]

Precision of devices

Lack of information

Uncertainty about the future

Anonymization

Page 4: Managing Uncertain Data

April 21, 2023Anish Das Sarma4

Applications: Information Extraction

Restaurant ZipHard Rock Cafe

94111 9413394109

Page 5: Managing Uncertain Data

April 21, 2023Anish Das Sarma5

Applications: Information Integration

name,hPhone,oPhone,hAddr,oAddr

name,phone,address

Combined View

Page 6: Managing Uncertain Data

April 21, 2023Anish Das Sarma6

Applications: Deduplication

NameJohn Doe

J. Doe? 80% match

Page 7: Managing Uncertain Data

April 21, 2023Anish Das Sarma7

Applications: Scientific & Medical Experiments

Probably not

cancer

Page 8: Managing Uncertain Data

How Do Database Management Systems (DBMS) Handle Uncertainty?

They don’t

April 21, 2023 8Anish Das Sarma

Page 9: Managing Uncertain Data

What Do (Most) Applications Do?

• Clean: turn into data that DBMSs can handle

April 21, 2023 9Anish Das Sarma

(1) Loss of information (2) Errors compound insidiously

Observer Bird-1

Mary Finch: 80%Sparrow: 20%

Susan

Dove: 70%Sparrow: 30%

Jane Hummingbird: 65%Sparrow: 35%

Bird-1

Finch

Dove

Hummingbird

Page 10: Managing Uncertain Data

Outline of The Talk

• Part 1: Managing Uncertainty in a DBMStheory systems

• Part 2: Handling Uncertainty in Data Integrationsystems theory

• Other Research (trailer)

• Future Plans

April 21, 2023 10Anish Das Sarma

Page 11: Managing Uncertain Data

Part 1: Managing Uncertain Data

• Primarily in the context of the Trio project1) Data2) Uncertainty3) Lineage

• Today’s focus: how lineage helps

April 21, 2023 11Anish Das Sarma

Page 12: Managing Uncertain Data

Uncertain Data

April 21, 2023 Anish Das Sarma 12

Uncertain Data

Sensor reported 75 ±0.5 F

Bob works for either Yahoo or Microsoft

Mary sighted either a Finch (80%) or a Sparrow (20%)

There is a 60% chance of rain in Stanford tomorrow

• An uncertain database represents a set of possible instances (or, possible worlds)

• Our work: finite sets of possible instances

Page 13: Managing Uncertain Data

13

Representing Uncertain Data• 20+ years of work (mostly theoretical)• Appears to be fundamental trade-off between

expressiveness & intuitiveness• We spent some time exploring the space of

models for uncertainty

April 21, 2023 Anish Das Sarma

Page 14: Managing Uncertain Data

14

Hierarchy of Models [ICDE 06]

R relations

A or-sets

?maybe-tuples

2 2-clauses

propFull propositional logic

sets tuple-sets

April 21, 2023 Anish Das Sarma

+ Expressive- Complex

+ Intuitive- Inexpressive

Next1.Consider a model M2.Isolate inexpressiveness3.Solve problem with lineage

Page 15: Managing Uncertain Data

15

Running Example: Crime-Solver

• Saw (witness, color, car) // may be uncertain

• Drives (person, color, car) // may be uncertain

• Suspects (person) = πperson(Saw ⋈ Drives)

April 21, 2023 Anish Das Sarma

Page 16: Managing Uncertain Data

16

Simple Model M

1. Alternatives: uncertainty about value2. ‘?’ (Maybe) Annotations

Saw (witness, color, car)

Amy red, Honda ∥ red, Toyota ∥ orange, Mazda

Three possibleinstances

April 21, 2023 Anish Das Sarma

Page 17: Managing Uncertain Data

17

Six possibleinstances

Simple Model M

1. Alternatives2. ‘?’ (Maybe): uncertainty about presence

?

Saw (witness, color, car)

Amy red, Honda ∥ red, Toyota ∥ orange, Mazda

Betty blue, Acura

April 21, 2023 Anish Das Sarma

Page 18: Managing Uncertain Data

April 21, 2023 Anish Das Sarma 18

Review: Relational Queries

D SQ

Saw

(witness, color, car)

Amy, red, Honda

Betty, blue, Acura

πperson(σcolor=red)

W (witness)

Amy

Page 19: Managing Uncertain Data

19

Queries on Uncertain Data

Closure:up-arrowalways exists

Completeness: All sets of possible instances can be represented

D

I1, I2, …, In J1, J2, …, Jm

D′

possibleinstances

Q on eachinstance

rep. ofinstances

directimplementation

April 21, 2023 Anish Das Sarma

Page 20: Managing Uncertain Data

20

Model M is Not Closed

Saw (witness, car)

Cathy

Honda ∥ Mazda

Drives (person, car)

Jimmy, Toyota ∥ Jimmy, Mazda

Billy, Honda ∥ Frank, Honda

Hank, Honda

Suspects

Jimmy

Billy ∥ Frank

Hank

Suspects = πperson(Saw ⋈ Drives)

???

Does not correctlycapture possibleinstances in theresult

CANNOT

April 21, 2023 Anish Das Sarma

Page 21: Managing Uncertain Data

21

to the RescueLineage

Model M + Lineage = Completeness

April 21, 2023 Anish Das Sarma

Page 22: Managing Uncertain Data

22

Example with Lineage

ID Saw (witness, car)

11

Cathy

Honda ∥ Mazda

ID Drives (person, car)

21

Jimmy, Toyota ∥ Jimmy, Mazda

22

Billy, Honda ∥ Frank, Honda

23

Hank, Honda

ID Suspects

31

Jimmy

32

Billy ∥ Frank

33

Hank

Suspects = πperson(Saw ⋈ Drives)

???

April 21, 2023 Anish Das Sarma

Page 23: Managing Uncertain Data

23

Example with Lineage

ID Saw (witness, car)

11

Cathy

Honda ∥ Mazda

ID Drives (person, car)

21

Jimmy, Toyota ∥ Jimmy, Mazda

22

Billy, Honda ∥ Frank, Honda

23

Hank, Honda

ID Suspects

31

Jimmy

32

Billy ∥ Frank

33

Hank

Suspects = πperson(Saw ⋈ Drives)

???

λ(31) = (11,2) Λ (21,2)λ(32,1) = (11,1) Λ (22,1); λ(32,2) = (11,1) Λ (22,2)λ(33) = (11,1) Λ 23

Correctly captures possible instances inthe result

Page 24: Managing Uncertain Data

24

Trio’s Data Model

1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidence values (next)4. Lineage

Uncertainty-Lineage Databases (ULDBs)Uncertainty-Lineage Databases (ULDBs)

Theorem: ULDBs are closed and complete [VLDB 06]Theorem: ULDBs are closed and complete [VLDB 06]

April 21, 2023 Anish Das Sarma

Formally studied properties like minimization, equivalence, approximation and membership. [VLDB 06, VLDB J. 08]

Formally studied properties like minimization, equivalence, approximation and membership. [VLDB 06, VLDB J. 08]

Page 25: Managing Uncertain Data

25

Confidence Values in Trio

• Confidence values supplied with base data– Default probabilistic interpretation

• Problem: Compute confidence values on result data [ICDE 08]

• 5-minute DBClip– Search “confidence computation” on YouTube.

April 21, 2023 Anish Das Sarma

Page 26: Managing Uncertain Data

26

Problem Description

ID Saw (witness,car)

11 (Amy, Honda) : 0.5

12 (Betty, Acura) : 0.6

ID Drives (person,car)

21

(Jimmy, Honda) : 0.9

22

(Billy, Honda) : 0.8

23

(Hank, Acura) : 1.0

ID Cars

41 Honda

42 Acura

Cars = πcar(Saw ⋈ Drives)

: ?

: ?

April 21, 2023 Anish Das Sarma

Page 27: Managing Uncertain Data

27

Operator-by-Operator

ID Saw (witness,car)

11 (Amy, Honda) : 0.5

12 (Betty, Acura) : 0.6

ID Drives (person,car)

21

(Jimmy, Honda) : 0.9

22

(Billy, Honda) : 0.8

23

(Hank, Acura) : 1.0

ID Cars

41 Honda

42 Acura

31 (Amy,Jimmy,Honda)

32 (Amy,Billy,Honda)

33 (Betty,Hank,Acura)⋈

Saw

Drives

πcar

: 0.5*0.9: 0.45

: 0.4

: 0.6

0.45 + 0.4 - (0.45*0.4): 0.67

Wrong!!

April 21, 2023 Anish Das Sarma

Page 28: Managing Uncertain Data

28

Operator-by-Operator

ID Saw (witness,car)

11 (Amy, Honda) : 0.5

12 (Betty, Acura) : 0.6

ID Drives (person,car)

21

(Jimmy, Honda) : 0.9

22

(Billy, Honda) : 0.8

23

(Hank, Acura) : 1.0

ID Cars

41 Honda

42 Acura

31 (Amy,Jimmy,Honda)

32 (Amy,Billy,Honda)

33 (Betty,Hank,Acura)

: 0.45

: 0.4

: 0.6

0.45 + 0.4 - (0.45*0.4)

Not independent!

April 21, 2023 Anish Das Sarma

Page 29: Managing Uncertain Data

29

Database Query Processing 101

April 21, 2023 Anish Das Sarma

Q

Query

Execution Plans

Pick and execute best plan

Statistics, indexes

Page 30: Managing Uncertain Data

30

Operator-by-Operator Confidence Computation

April 21, 2023 Anish Das Sarma

Q

Query

Plans

Can be much smaller or empty

Page 31: Managing Uncertain Data

31

Decouple Data and Confidence Computation

April 21, 2023 Anish Das Sarma

Q

Query

Plans1. Compute data2. Use lineage to

compute confidences (on demand)

Theorem: Arbitrary improvement. [ICDE 08]

Page 32: Managing Uncertain Data

32

Our Approach

ID Saw (witness,car)

11 (Amy, Honda) : 0.5

12 (Betty, Acura) : 0.6

ID Drives (person,car)

21

(Jimmy, Honda) : 0.9

22

(Billy, Honda) : 0.8

23

(Hank, Acura) : 1.0

ID Cars

41 Honda

42 Acura

: ?

: ?

λ(41) = 11 Λ (21 V 22)

λ(42) = 12 Λ 23

0.5 * (0.9 + 0.8 - 0.9*0.8): 0.49

: 0.6

Correct!!

April 21, 2023 Anish Das Sarma

Page 33: Managing Uncertain Data

Algorithm

April 21, 2023 Anish Das Sarma 33

Rt

t1 t2

t4

t5 t6 t7

λ(t) = f(t4,t5,t6,t7)

0.7

0.9 1.0 0.4

0.823

1. Expand lineage to base data

2. Get confidence of base data

3. Evaluate the probability λ(t)

Detecting independence

Memoization

Batch computation

0.4

Page 34: Managing Uncertain Data

Some Other Trio Work

April 21, 2023 34Anish Das Sarma

Modifications and Versioning [TR 08]-Stored derived relations-Modifications versions

Indexes and Statistics [MUD 08]-Specialized indexes, histograms

Functional Dependencies & Schema Design [TR 07]-Definitions, sound and complete axiomatization of FDs-Lossless decomposition-FD testing, finding, and inference

Page 35: Managing Uncertain Data

35

Related Work (sample)• Modeling Uncertainty: Plenty, covered in

textbooks• Systems: Avatar, BayesStore, MayBMS,

MYSTIQ, ORION, PrDB, ProbView, Trio, others?

April 21, 2023 Anish Das Sarma

Page 36: Managing Uncertain Data

Part 2: Data Integration

• Reboot!

April 21, 2023 36Anish Das Sarma

or, wake up!

Page 37: Managing Uncertain Data

Traditional Data Integration: Setup

D1

D2

D3D4

D5

Bib(title, authors, conf, year)

Author(aid, name)Paper(pid, title, year)AuthoredBy(aid,pid)

Mediated Schema

Publication(title, author, conf, year) 1. Mediated Schema

2. Schema Mappings

MappingSELECT P.title AS title, A.name AS author, NULL AS conf, P.year AS year, FROM Author AS A, Paper AS P, AuthoredBy AS BWHERE A.aid=B.aid AND P.pid=B.pid

3. Query Answering

Significant

up-front

effort

37

Who authored the most SIGMOD papers in the 90’s?

Mike Carey

Page 38: Managing Uncertain Data

“Pay-As-You-Go” Data Integration

1. Automated best-effort integration from the outset2. Further improve the system over time with feedback

38

How advanced a starting point can we provide?

April 21, 2023 Anish Das Sarma

Page 39: Managing Uncertain Data

• Automatic integrationMake guessesModel probabilities

• Specifically– Probabilistic schema mappings– Probabilistic mediated-schema

Anish Das Sarma 39April 21, 2023

to the RescueUncertainty

>90% accuracy in automatically integrating 50-800 data sources for several domains [SIGMOD 08]

Page 40: Managing Uncertain Data

Next

1. Probabilistic mediated schemas2. Probabilistic schema mappings3. Experimental results

Anish Das Sarma 40April 21, 2023

Page 41: Managing Uncertain Data

Mediated Schema

S1(name, email, phone-num, address) S2(person-name,phone,mailing-addr)

Med-S (name, email, phone, addr)

{name, person-name}

{phone-num, phone}

{address,mailing-addr}

{email}

A mediated schema is a clustering of a subset of the set of all attributes appearing in source schemas.

41Anish Das SarmaApril 21, 2023

Page 42: Managing Uncertain Data

Med1 ({name}, {phone, hPhone, oPhone}, {address, hAddr, oAddr})

Example

S1(name, hPhone, oPhone, hAddr, oAddr) S2(name,phone,address)

?

Q: SELECT name, hPhone, oPhone FROM Med 42

Page 43: Managing Uncertain Data

S1(name, hPhone, oPhone, hAddr, oAddr) S2(name,phone,address)

Med1 ({name}, {phone, hPhone, oPhone}, {address, hAddr, oAddr})

Med2 ({name}, {phone, hPhone}, {oPhone}, {address, oAddr}, {hAddr})

Q: SELECT name, phone, address FROM Med 43

Example

Page 44: Managing Uncertain Data

Med3 ({name}, {phone, hPhone}, {oPhone}, {address, hAddr}, {oAddr})

S1(name, hPhone, oPhone, hAddr, oAddr) S2(name,phone,address)

Med1 ({name}, {phone, hPhone, oPhone}, {address, hAddr, oAddr})

Med2 ({name}, {phone, hPhone}, {oPhone}, {address, oAddr}, {hAddr})

Q: SELECT name, phone, address FROM Med 44

Example

Page 45: Managing Uncertain Data

Med4 ({name}, {phone, oPhone}, {hPhone}, {address, oAddr}, {hAddr})

Med3 ({name}, {phone, hPhone}, {oPhone}, {address, hAddr}, {oAddr})

S1(name, hPhone, oPhone, hAddr, oAddr) S2(name,phone,address)

Med1 ({name}, {phone, hPhone, oPhone}, {address, hAddr, oAddr})

Med2 ({name}, {phone, hPhone}, {oPhone}, {address, oAddr}, {hAddr})

Q: SELECT name, phone, address FROM Med 45

Example

Page 46: Managing Uncertain Data

Med5 ({name}, {phone}, {hPhone}, {oPhone}, {address}, {hAddr}, {oAddr})

Med3 ({name}, {phone, hPhone}, {oPhone}, {address, hAddr}, {oAddr})

S1(name, hPhone, oPhone, hAddr, oAddr) S2(name,phone,address)

Med1 ({name}, {phone, hPhone, oPhone}, {address, hAddr, oAddr})

Med2 ({name}, {phone, hPhone}, {oPhone}, {address, oAddr}, {hAddr})

Med4 ({name}, {phone, oPhone}, {hPhone}, {address, oAddr}, {hAddr})

Q: SELECT name, phone, address FROM Med 46

Example

Page 47: Managing Uncertain Data

Med5 ({name}, {phone}, {hPhone}, {oPhone}, {address}, {hAddr}, {oAddr})

Med3 ({name}, {phone, hPhone}, {oPhone}, {address, hAddr}, {oAddr})

S1(name, hPhone, oPhone, hAddr, oAddr) S2(name,phone,address)

Med1 ({name}, {phone, hPhone, oPhone}, {address, hAddr, oAddr})

Med2 ({name}, {phone, hPhone}, {oPhone}, {address, oAddr}, {hAddr})

Med4 ({name}, {phone, oPhone}, {hPhone}, {address, oAddr}, {hAddr})

Q: SELECT name, phone, address FROM Med 47

Example

Page 48: Managing Uncertain Data

Med3 ({name}, {phone, hPhone}, {oPhone}, {address, hAddr}, {oAddr})

Probabilistic Mediated Schema

S1(name, hPhone, oPhone, hAddr, oAddr) S2(name,phone,address)

Med4 ({name}, {phone, oPhone}, {hPhone}, {address, oAddr}, {hAddr})

Pr=0.5

48Anish Das SarmaApril 21, 2023

Pr=0.5

• Probabilistic Mediated Schema (p-med-schema) is a set M = {(M1,Pr(M1)), …, (Mk,Pr(Mk))} where

• Mi is a med-schema; i≠j => Mi≠ Mj

• Pr(Mi)ϵ(0,1]; ΣPr(Mi) = 1

Page 49: Managing Uncertain Data

P-Mappings

PM1

Med3 (name, hPP, oP, hAA, oA)

S1(name, hP, oP, hA, oA)Pr=.64

Med3 (name, hPP, oP, hAA, oA)

S1(name, hP, oP, hA, oA)Pr=.16

Med3 (name, hPP, oP, hAA, oA)

S1(name, hP, oP, hA, oA)Pr=.16

Med3 (name, hPP, oP, hAA, oA)

S1(name, hP, oP, hA, oA)Pr=.04

PM2

Med4 (name, oPP, hP, oAA, hA)

S1(name, hP, oP, hA, oA)Pr=.64

Med4 (name, oPP, hP, oAA, hA)

S1(name, hP, oP, hA, oA)Pr=.16

Med4 (name, oPP, hP, oAA, hA)

S1(name, hP, oP, hA, oA)Pr=.16

Med4 (name, oPP, hP, oAA, hA)

S1(name, hP, oP, hA, oA)Pr=.04 49Anish Das SarmaApril 21, 2023

Page 50: Managing Uncertain Data

Expressive Power of P-Med-Schema & P-Mapping

Theorem 1. For one-to-many mappings: (p-med-schema + p-mappings) = (mediated schema + p-mapping) > (p-med-schema + mappings)

Theorem 2. When restricted to one-to-one mappings: (p-med-schema + p-mappings) = (p-med-schema + mappings) > (mediated schema + p-mapping)

50Anish Das SarmaApril 21, 2023

Page 51: Managing Uncertain Data

Next

• Creating p-med-schemas (briefly) • Creating p-mappings (briefly)• Experimental Results

Anish Das Sarma 51April 21, 2023

Page 52: Managing Uncertain Data

P-med-schema Creation

S2

S1name address

email-address

pname home-address

1

.6

.6

.2

52

April 21, 2023

1. Certain/uncertain edges

Page 53: Managing Uncertain Data

S2

S1name address

email-address

pname home-addressS2

S1name address

email-address

pname home-address

S2

S1name address

email-address

pname home-addressS2

S1name address

email-address

pname home-address

53

P-med-schema Creation2. Clustering

Page 54: Managing Uncertain Data

S2

S1name address

email-address

pname home-addressS2

S1name address

email-address

pname home-address

S2

S1name address

email-address

pname home-addressS2

S1name address

email-address

pname home-address

Pr=1/6 Pr=1/6

Pr=1/3 Pr=1/3

54

P-med-schema Creation3. Assign probabilities

Page 55: Managing Uncertain Data

P-mapping Creation

S=(num, pname, home-addr, office-addr)

T=(name, mailing-addr)

0.8 0.9 0.90.2

55

Goal: find a p-mapping that is consistent with a set of weighted correspondences

Theorem: There exists a p-mapping consistent if and only if for every source/target attribute a, the sum of the weights of all correspondences that involve a is at most 1.

Page 56: Managing Uncertain Data

Experiments Data: tables extracted from HTML tables on the web

Domain #Sources Search Keywords

Movie 161 movie, year

Car 817 make, model

People 49job/title, organization/company/employer

Course 647course/class, instructor/teacher/lecturer, subject/department/title

Bib 649 author, title, year, journal/conference

56Anish Das SarmaApril 21, 2023

Page 57: Managing Uncertain Data

• Gold standard: manual Approximate standard: semi-automatic• Precision, recall, F-measure for several SQL

queries varying attributes, selectivities

57

Experiments

Page 58: Managing Uncertain Data

Quality of Query AnsweringDomain Precision Recall F-measure

Golden Standard

People 1 .849 .918

Course 1 .852 .92

Approximate Golden Standard

Movie .95 1 .924

Car 1 .917 .957

People .958 .984 .971

Course 1 1 1

Bib 1 .955 .97758

Page 59: Managing Uncertain Data

Comparison with Other Approaches

Keyword search obtained low precision and low recall.

Querying the sources directly or considering only the highest probability mapping obtained low recall.

We obtained highest F-measure in all domains.

59

Page 60: Managing Uncertain Data

Comparison with Other Mediated-Schema Generation Methods

Using p-med-schema obtained highest F-measure in all domains.

60

Page 61: Managing Uncertain Data

System Setup Time (one domain)

61

Page 62: Managing Uncertain Data

Brief Related Work

• Approximate schema mappings [Magnani et. al. 2007], [Gal 2007], [Dong. et. al. 2007]

• Automatic generation of mediated schemas [He et. al. 2003],

• More (see paper)

Anish Das Sarma 62April 21, 2023

Page 63: Managing Uncertain Data

Finally…

• Other Research– Data Integration (2)– Deduplication (2)– Quality Estimation of Sensor/RFID Streams [IQIS 06]

• Future Plans

April 21, 2023 63Anish Das Sarma

Page 64: Managing Uncertain Data

Data Integration

April 21, 2023 64Anish Das Sarma

Problem: Foundations for integration of uncertain dataSolution [TR 08]: -Define open- and closed-containment for uncertain data-Algorithms, complexity of consistency checking and finding maximally-correct query answers

Problem: Dependencies in web-data integration (e.g., deep-web, plagiarism)Solution [TR 08]: Algorithms, complexity of fundamental problems: Coverage estimation, cost minimization and coverage maximization, and source ordering

Page 65: Managing Uncertain Data

Deduplication

April 21, 2023 65Anish Das Sarma

[SIGMOD 07]-Leveraging real-world constraints for deduplication-Tractable optimal solution and experiments over DBLP and ACM publication data

[WWW 07]-Detecting near-duplicate web-pages for crawling-Efficient indexing scheme supporting crawling speeds over web-scale data

Page 66: Managing Uncertain Data

Future Work

April 21, 2023 66Anish Das Sarma

Short & Medium-Term1.View management over uncertain databases: materialized view updates, versioning, partial materialization, …2.More applications of uncertain data3.More on lineage: internal/external lineage, approximate lineage, uncertain lineage, …

Page 67: Managing Uncertain Data

Future Work

April 21, 2023 67Anish Das Sarma

Long-term1.Applying uncertainty to other data management problems: query optimization? cloud computing?2.Improve quality of data through conflict resolution and feedback3.Web-data management: Handling huge amounts of data that is conflicting, uncertain, redundant, dependent, …

Page 68: Managing Uncertain Data

Thanks!

April 21, 2023 Anish Das Sarma 68

Anish Das [email protected]

http://i.stanford.edu/~anishds (or search “Anish Das Sarma”)