data cycle microbes

19
DATA LIFE CYCLE MICROBES: CONSTRAINTS, IMMEDIATE AND LONG TERM 26 TH OCTOBER EMBL-ABR

Upload: jyotikhadake

Post on 14-Apr-2017

165 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Data cycle microbes

DATA LIFE CYCLE MICROBES: CONSTRAINTS, IMMEDIATE AND LONG TERM

26TH OCTOBEREMBL-ABR

Page 2: Data cycle microbes

Challenges for data

• Management• Annotation• Analysis• Storage• Sharing

Page 3: Data cycle microbes

Long term planning and maintenance

• Project Funding and continuity of data availability

• Funders/ Institutional requirements : data from public resources must be in public domain.

• Availability of storage facilities• Availability of analysis facilities

DMP

Page 4: Data cycle microbes

Data sharing: going beyond required• Granting authorities• Journal requirements

Facilitation by:• Institutional sharing• Availability of repositories/Archives : GIT hub, EBI

Repositories, NCBI repositories, Institutional and National Data archives

• Analysis, Annotaion and Advertisement of resource• Data publication

Page 5: Data cycle microbes

Analysis workflow metagenomics*

Unirule

Interpro2go

Page 6: Data cycle microbes

Resource list

• Submission: GEO, SRA, Array express, ENA/Genbank/DDBJ; PRIDE, Metabolights

• Annotation: UniProt, GO, Interpro, Reactome, PDB, Interactome …

• Visualisation: Ensembl, Networks, Structures …

• External annotaions• Comparative genomics and Metagenomics

Page 7: Data cycle microbes

Importance of meta-data• Data valuation by addition of metadata• Incorrect/inadequate meta data affects Analysis,

Rediscovery• No meta-data makes set impossible to find, and of

no value. Tagging helps.• Student exercise –Soil/Coral – before they start a

submission• If you use resources enrich them for use by yourself

or others through submissions and annotation

Page 8: Data cycle microbes

Ontologies and standards

Interoperatibility searching and reasoning• Gene Ontology – are more terms needed?• EnvO Environment ontology Biome, Condition

and Material• EFO – experimental eta-data, more terms may

be needed• FOAM – Functional ontology Assignments for

Metagenomes• OBIB ontology ? BioBanking

Page 9: Data cycle microbes

Resource catalogue

BioSamples – deposit and reference study details for ‘Omics expts

OMICs expts –access using OMICS Discovery Index (http://www.omicsdi.org)

Page 10: Data cycle microbes
Page 11: Data cycle microbes

Eg. EnvO

Page 12: Data cycle microbes

Resource Metagenomics-Rapid Annotations using Subsystems Technology (MG-RAST) Community Cyberinfrastructure for Advanced Microbial Ecology Research and

Analysis (CAMERA) Integrated Microbial Genomes and Metagenomes (IMG/M)

Page 13: Data cycle microbes

Annotation transfer

Limited biochemical resources, limited number of manual curators to transfer data into databases (UniProtKB/Swiss-Prot, GO)Annotation transfer – Gene OntologyInterPro2GO EC2GOUniProt-keywords2GO Ensembl ComparaUniProt-subcellular locations2GO HAMAP2GOUniPathway2GO

Annotation transfer – TrEMBL annotationUniProt UniRules

All based on InterPro family/domain matches

Page 14: Data cycle microbes

Annotation transfer - InterPro2GO

InterPro

Page 15: Data cycle microbes

Annotation transfer - UniProt

Page 16: Data cycle microbes

Proprietary data

• Any data generation funded by a commercial entity may have data restrictions associated with it.

• Any data generation involving proprietary organisms/environs may have data restrictions on them.

• Data withdrawal – obsolete vs destroy

Page 17: Data cycle microbes

Data life-cycle

Sequence/assembly/Annotation/RNA seq

Public domaindata deposition

Update annotation

In-house data resource

Page 18: Data cycle microbes

Data sharing

• E-notebooks, scripts – use GitHub? Curation, removal of dead ends a real issue

• Will they work for complex multi-step processes

• Shell scripts enough? Genome-specific.• Best practice recording examples would be

appreciated• Consider Natural History scratchpads

Page 19: Data cycle microbes

Summary

• Identify potential issues early on in the project life-cycle – spending time identifying issues and planning how to address them

• Prepare to data share as early as possible – what information would you like to see if your were the data user.

• Think beyond the life-time of the grant, what are your long term plans for the sustainability of the data

• If issues with access do feed back. If not primary submission this should be sorted.