data cycle microbes

Post on 14-Apr-2017

165 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DATA LIFE CYCLE MICROBES: CONSTRAINTS, IMMEDIATE AND LONG TERM

26TH OCTOBEREMBL-ABR

Challenges for data

• Management• Annotation• Analysis• Storage• Sharing

Long term planning and maintenance

• Project Funding and continuity of data availability

• Funders/ Institutional requirements : data from public resources must be in public domain.

• Availability of storage facilities• Availability of analysis facilities

DMP

Data sharing: going beyond required• Granting authorities• Journal requirements

Facilitation by:• Institutional sharing• Availability of repositories/Archives : GIT hub, EBI

Repositories, NCBI repositories, Institutional and National Data archives

• Analysis, Annotaion and Advertisement of resource• Data publication

Analysis workflow metagenomics*

Unirule

Interpro2go

Resource list

• Submission: GEO, SRA, Array express, ENA/Genbank/DDBJ; PRIDE, Metabolights

• Annotation: UniProt, GO, Interpro, Reactome, PDB, Interactome …

• Visualisation: Ensembl, Networks, Structures …

• External annotaions• Comparative genomics and Metagenomics

Importance of meta-data• Data valuation by addition of metadata• Incorrect/inadequate meta data affects Analysis,

Rediscovery• No meta-data makes set impossible to find, and of

no value. Tagging helps.• Student exercise –Soil/Coral – before they start a

submission• If you use resources enrich them for use by yourself

or others through submissions and annotation

Ontologies and standards

Interoperatibility searching and reasoning• Gene Ontology – are more terms needed?• EnvO Environment ontology Biome, Condition

and Material• EFO – experimental eta-data, more terms may

be needed• FOAM – Functional ontology Assignments for

Metagenomes• OBIB ontology ? BioBanking

Resource catalogue

BioSamples – deposit and reference study details for ‘Omics expts

OMICs expts –access using OMICS Discovery Index (http://www.omicsdi.org)

Eg. EnvO

Resource Metagenomics-Rapid Annotations using Subsystems Technology (MG-RAST) Community Cyberinfrastructure for Advanced Microbial Ecology Research and

Analysis (CAMERA) Integrated Microbial Genomes and Metagenomes (IMG/M)

Annotation transfer

Limited biochemical resources, limited number of manual curators to transfer data into databases (UniProtKB/Swiss-Prot, GO)Annotation transfer – Gene OntologyInterPro2GO EC2GOUniProt-keywords2GO Ensembl ComparaUniProt-subcellular locations2GO HAMAP2GOUniPathway2GO

Annotation transfer – TrEMBL annotationUniProt UniRules

All based on InterPro family/domain matches

Annotation transfer - InterPro2GO

InterPro

Annotation transfer - UniProt

Proprietary data

• Any data generation funded by a commercial entity may have data restrictions associated with it.

• Any data generation involving proprietary organisms/environs may have data restrictions on them.

• Data withdrawal – obsolete vs destroy

Data life-cycle

Sequence/assembly/Annotation/RNA seq

Public domaindata deposition

Update annotation

In-house data resource

Data sharing

• E-notebooks, scripts – use GitHub? Curation, removal of dead ends a real issue

• Will they work for complex multi-step processes

• Shell scripts enough? Genome-specific.• Best practice recording examples would be

appreciated• Consider Natural History scratchpads

Summary

• Identify potential issues early on in the project life-cycle – spending time identifying issues and planning how to address them

• Prepare to data share as early as possible – what information would you like to see if your were the data user.

• Think beyond the life-time of the grant, what are your long term plans for the sustainability of the data

• If issues with access do feed back. If not primary submission this should be sorted.

top related