sysmo-db: a community-based approach to data sharing

32
SysMO-DB: A Community- Based Approach to Data Sharing Dr Katy Wolstencroft University of Manchester

Upload: caraf

Post on 08-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

SysMO-DB: A Community-Based Approach to Data Sharing. Dr Katy Wolstencroft University of Manchester. SysMO-DB. A data access, model handling and data integration platform for Systems Biology A web based resource That promotes shared understanding - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SysMO-DB: A Community-Based Approach to Data Sharing

SysMO-DB: A Community-Based Approach to Data Sharing

Dr Katy WolstencroftUniversity of Manchester

Page 2: SysMO-DB: A Community-Based Approach to Data Sharing

SysMO-DB

A data access, model handling and data integration

platform for Systems Biology A web based resource

That promotes shared understanding Using a common platform and common technologies

Started July 2008

DB

Page 3: SysMO-DB: A Community-Based Approach to Data Sharing

SysMO-DB Dev Team

University of Stellenbosch, South AfricaUniversity of Manchester, UK

Jacky Snoep

Heidelberg Institute for Theoretical Studies Germany

University of Manchester, UK

Olga Krebs

Wolfgang Müller

Sergejs Aleksejevs Carole Goble

Stuart Owen

Katy Wolstencroft

Finn Bacall

Franco B du Preez

Page 4: SysMO-DB: A Community-Based Approach to Data Sharing

Pan European collaboration Eleven individual projects, 89 institutes

Different research outcomes A cross-section of microorganisms, incl.

bacteria, archaea and yeast

Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way

Present these processes in the form of computerized mathematical models

Pool research capacities and know-how

Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave

http://www.sysmo.net

Systems Biology of Microorganisms

Page 5: SysMO-DB: A Community-Based Approach to Data Sharing

Challenges

Heterogeneous data and models Distributed groups of researchers Modellers and experimentalists have different

skills, training, experience Scientists want to remain in control

Social and technical challenges

Page 6: SysMO-DB: A Community-Based Approach to Data Sharing

Social Challenge: Focus Group

DB team Focus Group Projects

Show what is thereSuggest what is possible

Ask for requirements

Give requirementsTell priorities

Rate outcomesSuggest improvements

Double checkTransmit

Disseminate

Collect answers

Page 7: SysMO-DB: A Community-Based Approach to Data Sharing

Focus Group SysMO-DB PALS

21 Postdocs and PhD students Modellers, experimentalists

and bioinformaticians Design and technical

collaboration team Intense collaboration UK and Continental PALS

Chapters Audits and Sharing.

Methods, data, models, standards, software, schemas, spreadsheets, SOPs…..

20 questions Deployment into Projects

Page 8: SysMO-DB: A Community-Based Approach to Data Sharing

Technical Challenge

Rapid and incremental development Just enough and just in time , not Just in case No reinvention Driven by the PALs Sustainable and extensible Migrate to standards Fitting in with normal lab practices

Page 9: SysMO-DB: A Community-Based Approach to Data Sharing

What do we share

Protocol Title Authors Keywords Abstract Materials

ReagentsReagent Set UpEquipment

Time Taken Procedure Troubleshooting Critical Steps Anticipated Results References

Methods Data Results+ +

Nature Protocols

All SysMO Assets

Page 10: SysMO-DB: A Community-Based Approach to Data Sharing

Protocols for Models

Protocol Title Authors Keywords Description Assumptions Equations Numerical Methods/Algorithms Computational Tools Parameter Estimation Techniques Limitations References

What do we share

Methods Data Results+ +Models +

All SysMO Assets

Page 11: SysMO-DB: A Community-Based Approach to Data Sharing

SOP

A Tree View of Assets

Investigation Studies Assay

ConstructionValidation

SOP

SOP

ISA infrastructure provides a directory structure for experiments

http://isatab.sourceforge.net/

Page 12: SysMO-DB: A Community-Based Approach to Data Sharing

Expertise, tools

Coordinates, data

Page 13: SysMO-DB: A Community-Based Approach to Data Sharing

How do we share

“Just Enough Results Model” What type of data is it

Microarray, growth curve, enzyme activity… What was measured

Gene expression, OD, metabolite concentration…. What do the values in the datasets mean

Units, time series, repeats….

Based on: Minimum information models

e.g. MIAME, MIAPE, MIRIAM Biological ontologies

e.g. Gene Ontology, MGED, SBO Bioportal web service used in SysMO-SEEK for:

Concept lookup and visualisation

Page 14: SysMO-DB: A Community-Based Approach to Data Sharing

How do we share

Share JERM templates developed by SysMO-DB, PALs and consortium Spreadsheet templates Database Schemas

Encourage uptake throughout SysMO transcriptomics metabolomics proteomics etc….

Page 15: SysMO-DB: A Community-Based Approach to Data Sharing

Tools to help manage data:Annotation standards by stealth

Controlled vocabulary plug inBioPortal

Page 16: SysMO-DB: A Community-Based Approach to Data Sharing

JERM Model

SysMO JERM a ‘MIBBI’ for the SysMO-SEEK What do we need to help you find stuff?

Title, person, filename, class

What is experiment specific? What is experiment specific, but helps us map

between them? Common biological elements

chemicals, genes, proteins, organisms, strains

Page 17: SysMO-DB: A Community-Based Approach to Data Sharing

Identifying Biological Objects

What do you have in your data? Proteins/enzymes, genes/expression levels,

metabolites

Where/how do these objects interact? Pathways, flux, experimental conditions

What models describe these interactions

Possible when using common frameworks, naming schemes and controlled vocabularies

Page 18: SysMO-DB: A Community-Based Approach to Data Sharing

Following Standards We recommend formats but we do not enforce

them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum

information models Models – SBML and related standards Publications – PubMed and DOI

If you follow the prescribed formats, you get more out, but if you don’t, you can still participate

Lowering the adoption barrier

Page 19: SysMO-DB: A Community-Based Approach to Data Sharing

Access Permissions

Just Enough Sharing

...we don’t talk about security

Page 20: SysMO-DB: A Community-Based Approach to Data Sharing

COSMIC

SysMOLab

MOSES

Alfresco

Wiki

Wiki

ANOTHER

A DATASTORE

Just Enough sharing

SOP

Fetch on Request

Direct Upload

Page 21: SysMO-DB: A Community-Based Approach to Data Sharing

When do People Share

Data Collection Pre-publication Post-publication

Your own group and maybe your project

Project + maybe consortium

Consortium and wider community

Collaboration Discussion and criticism Advertising

• Suspicion and fear of scooping

• Reputation

SysMO Aims : sharing sooner

Page 22: SysMO-DB: A Community-Based Approach to Data Sharing

Incentives for sharing

Safe haven for data Credit and attribution Help with exporting to public repositories (e.g.

One-click export to ArrayExpress, PRIDE etc) A repository for “supplementary materials” in

publications Linking publications and data

Access other resources through a SEEK gateway

Page 23: SysMO-DB: A Community-Based Approach to Data Sharing

SEEK as a Gateway

JWS Online Plugin•online simulator, runs in SysMO-SEEK•upload models in SBML format•SBGN schemas, with annotations and external links

Page 24: SysMO-DB: A Community-Based Approach to Data Sharing

Incentives for sharing

Credit and attribution SEEK records who owns what. If data, models, or

protocols are reused, scientists get recognition Accountability

SEEK records who owns what. If you take credit for others work, they will see

Data citation – formal credit for data published in SEEK

Page 25: SysMO-DB: A Community-Based Approach to Data Sharing

Data Citation

Persistent identifiers and URLs for the data Linking people to the data Safe haven for the data Guarantees of sustainability

Data MUST be uploaded and archived If cited, it must be public

Page 26: SysMO-DB: A Community-Based Approach to Data Sharing

SEEK as a Safe Haven

HITS can archive SysMO data for 10 years All SysMO software is open source and available

Distinction between sustaining the service and the software

Page 27: SysMO-DB: A Community-Based Approach to Data Sharing

Governance and Policy

What is required by SysMO members? When should they share during their projects? How long after the project can they keep data private

to finish publications? If their data is stored locally, what is the archive

process? Policy from DMG and funding agencies and NOT

SysMO-DB

Page 28: SysMO-DB: A Community-Based Approach to Data Sharing

Governance and Policy

Proposals under discussion: All data registered in SEEK should be uploaded and

archived at the end of a SysMO project All data from finished projects should be shared

How long after the end? 1 day, 6 months, 1 year? Scientists can invoke “creator’s privilege” on SysMO

assets produced near the end of the project Extra time to write-up and publish before release to the

general public – respecting publication cycles

Page 29: SysMO-DB: A Community-Based Approach to Data Sharing

SysMO So Far…

People ARE sharing Over 300 assets in SEEK

SOPs: 102, Models: 17, DataFiles: 95 ,Investigations: 13, Studies: 26, Assays: 53

PALs – a network of young SysBio researchers Training and education in data and metadata

management spreading through the consortium Modellers and experimentalists communicating

Page 30: SysMO-DB: A Community-Based Approach to Data Sharing

SysMO Methods Spreading

Virtual Liver Mueller, via HITS

Lungsys SBCancer EraSysBio+

Eukaryotic organisms Interactions between host and pathogen Human disease Multi scale modelling

Page 31: SysMO-DB: A Community-Based Approach to Data Sharing

Why it works for us

A solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work

PhD students, Post-docs Build to the PALs requirements Respect publication cycles Respect cultural differences Scientists stay in control

Page 32: SysMO-DB: A Community-Based Approach to Data Sharing

Acknowledgements

SysMO-DB Team SysMO-PALS

myGrid, Hits and JWS Online EMBL-EBI, MCISB

http://www.sysmo-db.org