on the origin of data daniel deutch blavatnik school of computer science, raymond and beverly...
TRANSCRIPT
On the Origin of Data
Daniel Deutch
Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of
Exact Sciences
Data Evolvement
• This is the era of Data.– Databases, text, blogs, social data,…– Huge volumes
• Evolving Through Automatic Tools
• Sent Between Applications and Users
Provenance
Data Provenance
• Understanding how and why data has evolved is of fundamental importance – For authentication
• Both origin and propagators of data should be trustworthy
– For access control• Confidentiality constraints interplay with the
transformation
– For hypothetical reasoning • What if we change a piece of data?• How can we optimally affect data evolvement
Example• Alice posted photos with David• David is worried about Eve seeing his photos
OR OR AND NOT( )( )
Tracking Provenance• The logic is already implemented (e.g. to decide what photos to
show)
• We develop tools to “instrument” applications with provenance tracking.
• Simply maintaining an “activity log” is not good enough.
– We want also the possible “reasons” for activities
– E.g. “not blacklisted” is not an activity
• Instead we create formulas in generic algebraic constructions based on semirings
• We also develop tools that use the provenance information for analysis.
Generic Expression
Trust:
OR OR AND NOT( )( )
False OR ( ( True OR True) AND NOT False ) = True
Number of paths (if Alice and Eve are not friends) :
0 + ( ( 1 + 1 ) x 1 ) = 2
min ( (0:05 min 0:08 ) + 0:00 ) = 0:05Latency:
Provenance for SQL Queries
• Amsterdamer, D., Tannen, Provenance for Aggregate Queries [PODS ‘11]• Amsterdamer, D., Tannen, On the limitations of Provenance for Queries with Difference [Tapp ‘11]• D. , Milo, Roy, Tannen, Circuits for Datalog Provenance [ICDT ‘14]• Amsterdamer, D. ,Green, Karvounarakis, Tannen, Semiring-based Provenance for SQL Queries (In preparation)• D. , Moskovitch, Provenance for Relational Updates [In preparation]
Dep. Emp Prov.
Eng. Alice S
Eng. Bob T
Sales Carol S
Emps GoodEmps
Emp Prov.
Alice C
Bob S
Carol T
Dep. Prov.
Eng. S·C+T·S = S + T = S
Sales S·T = T
πDep(Emps GoodEmps)⨝
Provenance for Social and Web Data
• Bienvenu, D., Suchaneck, Provenance for Web 2.0 Data [Secure Data Management ‘12]• Abiteboul, Bienvenu, D., Deduction in the Presence of Distribution and Contradictions [WebDB ‘12] • Abiteboul, D., Vianu, Deduction with Contradictions in Datalog [ICDT ‘14]• Amarilli, D., Senellart, Provenance for Order-Aware Transformations (In preparation)
PROPOLIS:Provenance for Process Analysis
• D., Moskovich, Tannen, PROPOLIS: Provisioned Analysis of Data-Centric Processes [VLDB ’13]• D., Moskovich, Tannen, A Provenance Framework for Data-Dependent Process Analysis (Submitted)• D., Moskovich, Provenance for Distributed Processes (In preparation)
Thank you!