a sightseeing tour of prov and some of its extensions
TRANSCRIPT
1
A Sightseeing Tour of PROV and Some of its
Extensions Khalid Belhajjame
LAMSADE, Université Paris-Dauphine
16/03/16 MADICS: ReProVirtuFlow
MADICS: ReProVirtuFlow 2
Why do we care about provenance …
Help explain results and outliers Assess trust and quality Promote systems transparency: users are
able to determine whether a particular use of information is appropriate under a set of rules.
Assist in debugging Promote reuse and reproducibility
16/03/16
MADICS: ReProVirtuFlow 3
A bit of HistoryProvenance is not a new topic. There has been a lot of provenance work in:
Databases, Workflows, Information retrieval, …. By 2009, there have been a number of
models/vocabularies for expressing provenance information Open Provenance Model (OPM), Proof Markup Language (PML), Provenance Vocabulary, PREservation Metadata : Implementation
Strategies (PREMIS), Semantic Web Applications in Neuromedicine
(SWAN) Ontology, Dublin Core, ….
16/03/16
MADICS: ReProVirtuFlow 4
A bit of History 2009-2010: W3C Provenance Provenance Incubator
Group Objective: provides a state of the art and possible
recommendations for standardization efforts
2011: W3C Provenance Working Group Objective: To define a standard vocabulary primarily
for the semantic Web
2013: The W3C Provenance Working Group published a number of PROV recommendations and notes: PROV-DM, PROV-O, …
Since then a number of models and vocabularies have extended and/or defined mapping rules to PROV16/03/16
MADICS: ReProVirtuFlow 5
Family of PROV documents
16/03/16
MADICS: ReProVirtuFlow 6
Family of PROV documents
16/03/16
MADICS: ReProVirtuFlow 7
Provenance The W3C Provenance Working Group defined provenance as:
Provenance is defined as a record that describes the people, institutions, entities, and activities involved in
producing, influencing, or delivering a piece of data or a thing.
16/03/16
MADICS: ReProVirtuFlow 8
PROV…is not a recommendation for representing and collecting provenance information that should be adopted internally by all systems.
That is not realistic, and won’t happen any time soon
Instead, the aim to facilitate and promote interoperability between domains and applications that adopt their specific representations of provenance.
More pragmatic, and thus likely to happen.
16/03/16
MADICS: ReProVirtuFlow 9
Example
16/03/16
MADICS: ReProVirtuFlow 10
PROV Core Structures
16/03/16
MADICS: ReProVirtuFlow 11
Entity An entity is a physical, digital, conceptual, or
other kind of thing with some fixed aspects; entities may be real or imaginary.
Example: An entity may be the document at IRI http://www.bbc.co.uk/news/science-environment-17526723, a file in a file system, a car, or an idea.
16/03/16
MADICS: ReProVirtuFlow 12
Activity An activity is something that occurs over a
period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.
Example: An activity may be the publishing of a document on the Web, sending a twitter message, extracting metadata embedded in a file, driving a car from Paris to Lyon, etc.
16/03/16
MADICS: ReProVirtuFlow 13
Agent An agent is something that bears some form
of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.
Example: A site selling books on the Web and the companies hosting them can be seen as agents.
16/03/16
MADICS: ReProVirtuFlow 14
Usage and Generation
Usage is the beginning of utilizing an entity by an activity. Before usage, the activity had not begun to utilize this entity and could not have been affected by the entity. Example: A program beginning to read an
input file
Generation is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation. Example: the completed creation of a file by a
program16/03/16
MADICS: ReProVirtuFlow 15
Derivation Derivation is a transformation of an entity
into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.
Example: The transformation of a relational table into a linked data set
16/03/16
MADICS: ReProVirtuFlow 16
Association and Attribution
An activity association is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity Example: the workflow system is responsible
for the enactment of a workflow execution
Attribution is the ascribing of an entity to an agent. Example: A blog post can be attributed to an
author, a mobile phone to its manufacturer.
16/03/16
MADICS: ReProVirtuFlow 17
PROV Core Structures
16/03/16
MADICS: ReProVirtuFlow 18
W3C PROV Implementations: Preliminary Analysis
16/03/16
Source: https://khalidbelhajjame.wordpress.com/2013/04/04/w3c-prov-implementations/
MADICS: ReProVirtuFlow 19
PROV Compliant Vocabularies
This is by no mean complete ….
PROV
ProvONE
wfprov wfdescc
DC
PAV
extends
extendsc
extends
mapsTo
mapsTo
16/03/16
MADICS: ReProVirtuFlow 20
Prospective provenance
Retrospective provenance
ProvONE: A PROV Extension Data Model for Scientific Workflow
Provenance
16/03/16
MADICS: ReProVirtuFlow 21
PAV ontology: provenance, authoring and versioning
16/03/16
MADICS: ReProVirtuFlow 22
PAV ontology: provenance, authoring and versioning
16/03/16
MADICS: ReProVirtuFlow 23
Acknowledgements W3C Provenance Working Group DataONE Workflow and Provenance Interest
Group PAV’s friends: Paolo Ciccarese, Stian Soiland-
Reyes, Alasdair JG Gray, Carole Goble and Tim Clark
16/03/16
MADICS: ReProVirtuFlow 24
A Sightseeing Tour of PROV and Some of its
Extensions Khalid Belhajjame
LAMSADE, Université Paris-Dauphine
16/03/16