structuring what we know and use that to better understand...

58
Structuring what we know and use that to better understand your data @Chris_Evelo: Department of Bioinformatics BiGCaT, WikiPathways team, ELIXIR Interoperability team, Open PHACTS

Upload: others

Post on 01-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Structuring what we know and use that to better understand your data

@Chris_Evelo: Department of Bioinformatics – BiGCaT,

WikiPathways team, ELIXIR Interoperability team, Open PHACTS

Page 2: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

So many…

ELIXIR, EXCELERATE, CORBEL, GA4GH, EGA, dbNP, ENPADASI, DISH, Open PHACTS, BBMRI, DRE, EuroCAT, DTL, EATRIS, DiXa, UniProt, PDB, CheBI, ChEMBL, HMDB, ISA, FAIR, RDF, VOID, Nanopubs, eNanomapper, KEGG, Reactome, Entrez, Parelsnoer, Arrayexpress, GEO, ENCODE, Recon2, SMBL, SBGN, MIM

And that is just what I discussed yesterday…

Page 3: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

The typical question we get about using big data

Page 4: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

We can do things like this (diabetic liver)

Pihlajamäki et al. dataset is from Gene Expression Omnibus (accession number GSE15653)

Pihlajamäki et al. J ClinEndocrinol Metab. 2009, 94 (9): 3521-3529. DOI: 10.1210/jc.2009-0212.

Martina Kutmon et al.BMC Genomics 2014, 15:971.DOI: 10.1186/1471-2164-15-971

Page 5: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Data predators

Page 6: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Data: Wang et al. 2011. in Gene Expression Omnibus (GEO, http://ncbi.nlm.nih.gov/geo/, accession number: GSE17461.

Published paper: Effects of 1alpha,25 dihydroxyvitamin D3 and testosterone on miRNA and mRNA expression in LNCaP cells. WL Wang et al. Mol Cancer 2011. 10. doi:10.1186/1476-4598-10-58

Or: Vitamin D effects on prostate cancer cells

Page 7: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Integrative network-based analysis of mRNA and microRNA expression in vitamin D3-treated cancer cells

Page 8: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 9: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 10: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Internal &external

datarepositories

e.g. dbNP,Sage, Atlas

knowledgeresources &

(semantic web)Integration

e.g. Open PHACTSWikiPathways

study capturingISA

models

studydataprocessing,statistics,storagee.g. arrayanalysis.org

ontologies

modeling & data integration,network biology (extension),supervised statistics

curation, simulation annotation &

provenance

Integrative Systems Biology

researchapplications

mappingBridgeDb

extraction,SPARQLingconversion

Page 11: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 12: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 13: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

http://www.wikipathways.org/instance/WP430

Page 14: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

http://www.wikipathways.org/index.php/Pathway:WP430

Page 15: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

WikiPathways

• Public resource for biological pathways

• Anyone can contribute and curate

• More up-to-date representation of biological knowledge

WikiPathways: capturing the full diversity of pathway knowledge. M Kutmon et al

Nucleic Acids Res 2015: first published online: Oct 19.

Big data: Wikiomics. Mitch Waldrop. Nature 2008: 455, 22-25

We the curators. Allison Doerr. Nature Methods 2008: 5, 754–755

No rest for the bio-wikis. Ewen Callaway. Nature 2010: 468, 359-360

Page 16: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 17: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

How to do interoperable data visualization?

Page 18: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Connect to Genome Databases

Page 19: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Backpages link to multiple databases

Page 20: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

You could do this for gene lists

Page 21: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Don’t be afraid to reinvent wheels!

Page 22: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

BridgeDb: Abstraction Layer

interface

IDMapper

class

IDMapperRdb

relational database

class

IDMapperFile

tab-delimited text

class

IDMapperBiomart

web service

The BridgeDb Framework: Standardized Access to Gene, Protein and Metabolite Identifier

Mapping Services. Martijn P van Iersel, Alexander R Pico, Thomas Kelder, Jianjiong Gao, Isaac Ho,

Kristina Hanspers, Bruce R Conklin, Chris T Evelo. BMC Bioinformatics 2010, 11: 5.

Page 23: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Combine: WikiPathways tissue analyzer

Work done by Jonathan Melius

Page 24: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

WikiPathways, a house of webs?

Page 25: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Combine: adding miRNA’s clutters

Page 26: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Combine: regulator Interaction in MiPaSt PathVisio plugin

Work done by Christian Oertlin.

Page 27: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Pathways in Cytoscape

Page 28: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Figure 2. The Cardiac Hypertrophic Response pathway loaded as a network.

Kutmon M, Lotia S, Evelo CT and Pico AR 2014 [v1; ref status: indexed, http://f1000r.es/3ij] F1000Research 2014, 3:152 (doi: 10.12688/f1000research.4254.1)

Page 29: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

PPS1

Liver

All pathways

Pathways with high z-score

grouped together.

Explains why there are

relatively few significant

genes, but many pathways

with high z-score.

Cytoscape visualization used to group

Page 30: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Pathway interactions and what causes them

Thomas Kelder, Lars Eijssen, Robert Kleemann, Marjan van Erk, Teake Kooistra, Chris Evelo

(2011) Exploring pathway interactions in insulin resistant mouse liver.

BMC Systems Biology 5: 127 Aug. http://dx.doi.org/doi:10.1186/1752-0509-5-127

Page 31: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Pathway interactions and

detailed network visualization

for the interactions with three

apoptosis related pathways for

the comparison between HF and

LF diet at t = 0. A: Subgraph of the

pathway interaction network, based

on incoming interactions to three

stress response and apoptosis

pathways with the highest in-

degree. Pathway nodes with a thick

border are significantly enriched (p

< 0.05) with differentially expressed

genes. B: The protein interactions

that compose the interactions

between the three apoptosis

related pathways and their

neighbors in the subgraph as

shown in box A (see inset, included

interactions are colored orange).

Protein nodes have a thick border

when their encoding genes are

significantly differentially expressed

(q < 0.05).

Page 32: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 33: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Regulation resources

Page 34: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

human ErbB signaling pathway extended with validated microRNA regulation

Page 35: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

If we don’t do the magic

Page 36: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 37: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

LiteraturePubChem

GenbankPatents

DatabasesDownloads

Data Analysis Data Integration Firewalled Databases

How do R&D companies use public data?

Page 38: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

How do pharma companies use public data?

Pfizer

AZ

Roche

n

Page 39: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

@gray_alasdair Big Data Integration 39

Page 40: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Semantic web grammar

Page 41: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 42: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Nanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)

Domain

Specific

Services

Identity

Resolution

Service

Chemistry

Registration

Normalisation

& Q/C

Identifier

Management

Service

Indexing

Co

re P

latf

orm

P12374

EC2.43.4

CS4532

“Adenosine

receptor 2a”

VoID

Db

Nanopub

Db

VoID

Db

VoID

Nanopub

VoID

Public Content Commercial

Public

Ontologies

User

Annotations

Apps

Page 43: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Nanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)

Domain

Specific

Services

Identity

Resolution

Service

Chemistry

Registration

Normalisation

& Q/C

Identifier

Management

Service

Indexing

Co

re P

latf

orm

P12374

EC2.43.4

CS4532

“Adenosine

receptor 2a”

VoID

Db

Nanopub

Db

VoID

Db

VoID

Nanopub

VoID

Public Content Commercial

Public

Ontologies

User

Annotations

Apps

Page 44: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Choose a standard

Page 45: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Link one resource to another

Page 46: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Or use both and map

Page 47: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 48: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Mapping tools are core tools: need funding and sustainability

Page 49: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Database identifier mapping tools we have:

• A software framework (BridgeDb)– Application in WikiPathways, PathVisio, Cytoscape, R/Bioconductor– An installable webservice– Open source– Community based– Database based (small)

• A semantic web implementation (Open PHACTS IMS)– With installable Docker image– Linkset based (fast)– Transitivity (and limits for that)

• gene -> protein -> has enzyme code• Protein -> has enzyme code -> other proteins

• Identifiers.org for ID schema’s and resolution

Page 50: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

This is not just Open PHACTS

Federated SPARQL queries:

e.g. find all genes related to disease, then all pathways with these genes…

Used as hackaton (swat4ls) examples

Only works sometimes, by chance

Needs integrated ID mapping!

Page 51: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Ontology mapping• Many available, even as services

• Often integated in data resources

– Make my own, slim, combine, map, extend

– Needs feedback to original!

Page 52: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Metabolite mapping needs

• More mappings! (plant products, drugs, xenobiotics)

• Ontology based mapping (CheBi)

• Because:

– Palmitic acid is a fatty acid

– R,R,R-tocopherol is a form of Vitamin E

• And these should (sometimes) map

Page 53: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Also applies to biology:scientific lenses

Page 54: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Chemistry mapping

• Structure not ID based

• Allow substructure searches

• Open PHACTS open source ???

• We need it, may have to redo

Page 55: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 56: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression
Page 57: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

From reproducibility to reusability

Page 58: Structuring what we know and use that to better understand ...projects.bigcat.unimaas.nl/birmingham2017/wp... · Integrative network-based analysis of mRNA and microRNA expression

Reuse problems

The age distribution in the experimental groups were not significantly different…

Can we reuse that data to find out age effects?

Yes, if that is actually captured

Needs:Ontologies (bioportal)Principles/standards (FAIR, ISA)Capture tools (dbNP, Molgenis, OpenCLinica, eNotebooks)Study repositories (Biosamples, Biostudies)Data repositories (EGA, GEO, Arrayexpress, Metabolights, Pride)