on the origin of data daniel deutch blavatnik school of computer science, raymond and beverly...

11
n the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Upload: alexander-goodman

Post on 24-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

On the Origin of Data

Daniel Deutch

Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of

Exact Sciences

Page 2: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Data Evolvement

• This is the era of Data.– Databases, text, blogs, social data,…– Huge volumes

• Evolving Through Automatic Tools

• Sent Between Applications and Users

Page 3: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Provenance

Page 4: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Data Provenance

• Understanding how and why data has evolved is of fundamental importance – For authentication

• Both origin and propagators of data should be trustworthy

– For access control• Confidentiality constraints interplay with the

transformation

– For hypothetical reasoning • What if we change a piece of data?• How can we optimally affect data evolvement

Page 5: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Example• Alice posted photos with David• David is worried about Eve seeing his photos

OR OR AND NOT( )( )

Page 6: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Tracking Provenance• The logic is already implemented (e.g. to decide what photos to

show)

• We develop tools to “instrument” applications with provenance tracking.

• Simply maintaining an “activity log” is not good enough.

– We want also the possible “reasons” for activities

– E.g. “not blacklisted” is not an activity

• Instead we create formulas in generic algebraic constructions based on semirings

• We also develop tools that use the provenance information for analysis.

Page 7: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Generic Expression

Trust:

OR OR AND NOT( )( )

False OR ( ( True OR True) AND NOT False ) = True

Number of paths (if Alice and Eve are not friends) :

0 + ( ( 1 + 1 ) x 1 ) = 2

min ( (0:05 min 0:08 ) + 0:00 ) = 0:05Latency:

Page 8: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Provenance for SQL Queries

• Amsterdamer, D., Tannen, Provenance for Aggregate Queries [PODS ‘11]• Amsterdamer, D., Tannen, On the limitations of Provenance for Queries with Difference [Tapp ‘11]• D. , Milo, Roy, Tannen, Circuits for Datalog Provenance [ICDT ‘14]• Amsterdamer, D. ,Green, Karvounarakis, Tannen, Semiring-based Provenance for SQL Queries (In preparation)• D. , Moskovitch, Provenance for Relational Updates [In preparation]

Dep. Emp Prov.

Eng. Alice S

Eng. Bob T

Sales Carol S

Emps GoodEmps

Emp Prov.

Alice C

Bob S

Carol T

Dep. Prov.

Eng. S·C+T·S = S + T = S

Sales S·T = T

πDep(Emps GoodEmps)⨝

Page 9: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Provenance for Social and Web Data

• Bienvenu, D., Suchaneck, Provenance for Web 2.0 Data [Secure Data Management ‘12]• Abiteboul, Bienvenu, D., Deduction in the Presence of Distribution and Contradictions [WebDB ‘12] • Abiteboul, D., Vianu, Deduction with Contradictions in Datalog [ICDT ‘14]• Amarilli, D., Senellart, Provenance for Order-Aware Transformations (In preparation)

Page 10: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

PROPOLIS:Provenance for Process Analysis

• D., Moskovich, Tannen, PROPOLIS: Provisioned Analysis of Data-Centric Processes [VLDB ’13]• D., Moskovich, Tannen, A Provenance Framework for Data-Dependent Process Analysis (Submitted)• D., Moskovich, Provenance for Distributed Processes (In preparation)

Page 11: On the Origin of Data Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences

Thank you!