works 11 presentation

17
Date: 14/11/2011 A new Approach for Publishing Workflows: Abstractions, Standards and Linked Data Daniel Garijo Ontology Engineering Group, Departamento de Inteligencia Artificial. Universidad Politécnica de Madrid Yolanda Gil Information Sciences and Institute University of Southern California, Marina del Rey

Upload: dgarijo

Post on 27-Jun-2015

744 views

Category:

Technology


0 download

DESCRIPTION

Presentation for the paper: A new Approach for Publishing Workflows: Abstractions, Standards and Linked Data

TRANSCRIPT

Page 1: WORKS 11 Presentation

Date: 14/11/2011

A new Approach for Publishing Workflows: Abstractions,

Standards and Linked Data

Daniel Garijo

Ontology Engineering Group, Departamento de Inteligencia Artificial. Universidad Politécnica de Madrid

Yolanda GilInformation Sciences and Institute

University of Southern California, Marina del Rey

Page 2: WORKS 11 Presentation

1

Index of contents

Index:

1. Background

2. Limitations of existing approaches to workflow publication

3. Features of our approach

• Publishing abstract workflows and specific workflows

• OPMW Ontology

• Linked Data Publication

4. Workflow querying and Linked Data consumption

5. Conclusions

Page 3: WORKS 11 Presentation

2

Background

Text:Narrative of method,

software packages used

Software:scripted codes + manual steps +

notes/emails

Workflow: Workflow/scripts describing

dataflow, codes, and parameters

Data:Key datasets and figures/plots

Typical Published Article

Text:Narrative of method,

software packages used

Data:Key datasets and figures/plots

Reproducible Article: Weaver, GenePattern GRRD, etc.

NOT published, loosely recorded:

Page 4: WORKS 11 Presentation

3

Current issues with existing publication approaches

Only executable workflow is published:1. Must have the same codes to re-execute

the workflow, but:– Codes become unavailable

• Eg: eHits was proprietary and replaced by AutodockVina

– Different labs prefer different codes • Eg: R vs Matlab• Eg: viz in Citoscape vs yEd

2. Must have the same workflow framework to re-execute the workflow– Must have R for Weaver

3. Must import files to local file system and workflow framework– Must import bundle of workflow/data/code

files to reproduce

Workflow: Workflow/scripts describing

dataflow, codes, and parameters

Text:Narrative of method,

software packages used

Data:Key datasets and figures/plots

Reproducible Article: Weaver, GenePattern GRRD, etc.

Page 5: WORKS 11 Presentation

4

Key Features of our approach

• Publish an abstract workflow in addition to executable workflow– Description of workflow that is independent of the codes executed– Maps to the codes executed (the “executable workflow”)

• Publish both abstract and executable workflow using the OPM standard – OPM (Open Provenance Model) is independent of workflow framework and is

widely implemented– Other groups can import to their own workflow framework

• Publish data and workflows as Linked Data on the Web– All workflows and related files are web-accessible– Simple mechanism to share across local file systems

Page 6: WORKS 11 Presentation

5

What is Linked Data

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful information.

4. Include links to other URIs.

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Page 7: WORKS 11 Presentation

6

High level architecture

Interactive Browsing

(Pubby frontend)

Programatic access(external apps)

Wings workflow generation

OPMconversion Publication Share Reuse

Core

Portal

WINGS on local laptopWorkflow Template

WorkflowInstance

OPMexport

Core

Portal

WINGS on shared hostWorkflow Template

WorkflowInstance

OPMexport

Core

Portal

WINGS on web serverWorkflow Template

WorkflowInstance

OPMexport

LinkedData

Publication

Users

Other workflow environments

Page 8: WORKS 11 Presentation

7

Publishing the abstract workflow

Comparison of Dissimilar protein

structures workflow

Page 9: WORKS 11 Presentation

8

OPMW Ontology

opmo:account

opmo:account

opmo: account

opmw:ProcessTemplatetemplateNode1

opmw:WorkflowTemplate

template1

opmw:ArtifactTemplate

artifact1

opmw:ArtifactTemplateoutputArtifact1

ac:AbstractComponentabsComp1

opmw:ProcessInstanceexecutionNode1

opmw:ArtifactInstance

execInput1

opmw:ArtifactInstace

executionOutput1

ac:SpecificComponentspecComp1

opmw:ExecutionAccount

account1

Abstract Workflow Executable Workflow

opmv:Agentuser1 opmo:account

opmo:hasArtifact

opmo:hasArtifact

opmw:hasWorkflowTemplate

opmw:hasArtifactTemplate

opmw:hasProcessTemplate

opmw: hasArtifactTemplate

opmv:wasGeneratedBy opmv:wasGeneratedBy

opmv:used opmv:usedopmv:wasControlledBy

opmw:hasSpecificComponentopmw:hasTemplateComponent

opmo:hasProcess

opmv:Process

opmv:Artifact

opmv:Artifact

opmo:Account

opmo:OPMGraph opmv:Process

opmv:Artifact

opmv:Artifact

Page 10: WORKS 11 Presentation

9

Publication of Workflows as Linked Data

RDFTriple store

RDFTriple store

Permanentweb-

accessiblefile

store

Permanentweb-

accessiblefile

store

RDF Upload Interface

RDF Upload Interface

SPARQL EndpointSPARQL

Endpoint

Linked Data publication

Web accessible

WorkflowData,

Components, etc.

OPMconversion

Other workflow frameworks

OPMimport

Wings

Web browser

OPMconversion

Abstract Workflow

(OPM)

ExecutableWorkflow

(OPM)

Page 11: WORKS 11 Presentation

10

Searching/Browsing Workflows as Linked Data

Properties

Autocomplete search bar

Resource URI (Process instance)

Specific component for this process instance

Types ofsearch

Page 12: WORKS 11 Presentation

11

Searching/Browsing Workflows as Linked Data

Component Name

Component Inputs

Component Outputs

Code Implementations

Template additional metadata

Record of the different executions of this workflow

Page 13: WORKS 11 Presentation

12

Conclusions

1. Publication of an abstract workflow that represents the computational method in an execution-independent manner.

2. Publication of the abstract workflow and the executed workflow using the OPM standard that is independent of the execution environment used.

3. Publication of the workflows, components, codes and datasets as Linked Data on the web.

Page 14: WORKS 11 Presentation

13

Future work

• Extensions to abstract workflow publication– Be able to provide abstractions on several steps.– Incomplete provenance.

• Create an OPMV/W3C PROV-O profile for common workflow representation.– Increase interoperability with other workflow representation systems.

• Workflow reuse in different workflow systems.– Import and execute workflows in other workflow frameworks.

Page 15: WORKS 11 Presentation

14

References

• WINGS workflow system: http://seagull.isi.edu/marbles/

•The Open Provenance Model Specification: http://openprovenance.org/

• OPMO: http://openprovenance.org/model/opmo

•OPMV: http://open-biomed.sourceforge.net/opmv/ns.html

• TB Drugome Wiki (Evolution of this work): http://seagull.isi.edu/wings-drugome/index.php/Main_Page

•W3C PROV-O current ontology (draft): http://www.w3.org/2011/prov/wiki/PIL_OWL_Ontology

•Principles of Linked Data: http://www.w3.org/DesignIssues/LinkedData.html

Page 16: WORKS 11 Presentation

15

Acknowledgements

•UCSD people:

• Li Xie

• Lei Xie

• Sarah Kinnings

• Phil Bourne

•ISI people:

• Varun Ratnakaar

•OEG people:

• Oscar Corcho

Page 17: WORKS 11 Presentation

Date: 14/11/2011

A new Approach for Publishing Workflows: Abstractions,

Standards and Linked Data

Daniel Garijo

Ontology Engineering Group, Departamento de Inteligencia Artificial. Universidad Politécnica de Madrid

Yolanda GilInformation Sciences and Institute

University of Southern California, Marina del Rey