giga2 structuring phenotype data

Post on 17-Jul-2015

48 Views

Category:

Science

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

GIGA2, Munich, March 2015

STRUCTURING

PHENOTYPE DATA:

Chris

Mungall

LBNL,

Berkeley

Gene

Ontology

Lessons from vertebrate

genomes

Web Apollo: http://genomearchitect.org

Desvignes, T., Pontarotti, P., & Bobe, J. (2010).

Nme gene family evolutionary history reveals pre-

metazoan origins and high conservation between

humans and the sea anemone, nematostella

vectensis. PLoS ONE, 5(11).

doi:10.1371/journal.pone.0015506

Genome

structures are

highly

amenable to

comparison

Can we compute over the architecture of phenomes as we do

for genome architecture?

oWhat genes affect distal appendage length or shape?

oWhat are the genes expressed in the mouth during development?

oWhat structures develop using the same gene regulatory networks as

in bilaterian mouths?

Current methods

o Text based search of literature and manually gather results

Time consuming

Hard to automate

COMPUTING OVER PHENOTYPES

Ge

ne

Every phenotype ever to have existed

expressed

in mouth

Affects appendage length

regulates EMT …

PHENOTYPES: ENDLESS FORMS

Pe

yto

ian

ath

ors

tiA

mp

hip

ho

lis

sq

ua

ma

taP

etr

om

yzo

nm

ari

nu

s

Bu

gu

la

Ho

mo

sa

pie

ns

(wit

h c

left

pa

late

)

Myste

ce

tiA

ply

sin

aa

ero

ph

ob

aG

astr

ula

(M

eta

zoa

n)

mouth anusosculum

blastopore

cleft

lip and

palate

Ge

ne

“expressed

in mouth”

“affects appendage length”

“long tentacles”

“elongated arms”

FREE TEXT != STRUCTURED

“expressed

around oral

opening”

“expressed

in anterior

end of gut

tube”

ONTOLOGIES: STRUCTURING A DIVERSITY

OF PHENOTYPES

tentacle

tentacular

bud

circumoral

appendage

tentacular

club sucker

arm

develops

into

is a subtype of

Is part of

homologous

arm IV

https://github.com/obophenotype/cephalopod-ontology

mouthsurrounds

ONTOLOGIES FOR MOLECULAR

PHENOTYPES

tentacle

tentacular

bud

circumoral

appendage

tentacular

club sucker

arm

develops

into

is a subtype of

Is part of

homologous

arm IV

Scr Lox5 Antp

Expressed in

mouthsurrounds

GRAPH KNOWLEDGE QUERIES

tentacle

tentacular

bud

circumoral

appendage

tentacular

club sucker

arm

develops

into

is a subtype of

Is part of

homologous

arm IV

Scr Lox5 Antp

Expressed in

mouthsurrounds

“What genes

Are expressed in

structures that develop from

a tentacle bud, or homologs?”

ONTOLOGIES FOR TRAITS

tentacle

tentacular

bud

circumoral

appendage

tentacular

club sucker

arm

develops

into

is a subtype of

Is part of

homologous

arm IV

mouthsurrounds

shape length++

=shape of

tentacular club

=length of

arm IV

Wild-type phenotypic function:

o The Gene Ontology

Anatomy:

o Uberon anatomy ontology

APPLICATIONS OF ONTOLOGIES

For curating the ‘wild type functional phenotypes’

Genes for over 0.5 million species have associations to GO

terms

>40,000 terms

oMolecular function

o Cellular component

o Biological Process

Core and taxon-specific

Uses include

o Gene set selection

o Term enrichment

THE GENE ONTOLOGY

Gene Ontology: tool for the unification of biology: Ashburner et al. Nature Genetics 25, 25 - 29 (2000)

http://geneontology.org

Experimental

o Curated from literature

Automated methods:

o Based on sequence similarity

E.g. blast2go

o Based on protein features

Interpro2GO

o Based on phylogenetic evidence

Ensembl COMPARA

Panther Families and PAINT

Typically only applied for

conserved cellular biology

ASSIGNING GENE FUNCTION

Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.

Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042

PAINT

EXTRACTING GENE LISTS AND

INTERPRETING TRANSCRIPTOMIC DATA

Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang,

Z., … Irie, N. (2013). The draft genomes of soft-shell turtle and

green sea turtle yield insights into the development and evolution

of the turtle-specific body plan. Nature Genetics, 45(6), 701–6.

doi:10.1038/ng.2615

BEYOND THE GO

Functional

Genomics: Gene

function

Transcriptomics:

Gene expression

Phenomics: Effects

of gene mutations

Gene Ontology

Anatomy and Stage

Ontology

Phenotype and Trait

Ontology

Links genes to

What they do

Links genes to

where they

are expressed

Links genes to

what happens

when they are

disrupted

Core: 14,000 terms

o Bias towards vertebrate systems

Composite-Metazoan edition: 42,000 terms

o Integrates cell types, developmental stages,

o Species-specific ontologies

Uses

o Standard reference for animal anatomy

o Linking model organism databases

o Evolutionary systematics (Phenoscape)

o Comparative transcriptomics (Bgee)

o Standardized vocabulary for mammalian

sequencing consortia

o Cross-species phenotype matching (Monarch)

THE UBERON MULTI-SPECIES

COMPARATIVE ANATOMY ONTOLOGY

http://uberon.org

Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species

anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5

PHENOSCAPE: LINKING EVOLUTION TO

GENOMICS USING PHENOTYPE ONTOLOGIES

Phenotypic knowledgebase

o Linking phenotypes to extant and extinct vertebrate taxa

o Integrate with model organism databases

Extending Uberon to cover diversity of vertebrates

Haendel, MA, Balhoff JP, ..., Sereno, PC., Mungall, C.J (2014).

Unification of multi-species vertebrate anatomy ontologies for

comparative biology in Uberon. Journal of Biomedical Semantics,

5(1), 21. doi:10.1186/2041-1480-5-21

UBERON FOR COMPARATIVE GENE

EXPRESSION

EXAMPLE OF EXPRESSION DATA

Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence

ENSMUSG

00000071424Grid2 UBERON:00

00112

sexually

immature

UBERON:00

02979

Purkinje cell

layer of

cerebellar

cortex

high quality

ENSMUSG

00000071424Grid2 UBERON:00

18241

prime adult UBERON:00

04720

cerebellar

vermis

high quality

Mus_musculus (‘simple’ expression file)

http://bgee.org/?page=download

EXAMPLE OF INFERRED EXPRESSION

DATA

Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence

ENSMUSG

00000071424Grid2 UBERON:0

000112

sexually

immature

UBERON:00

02979

Purkinje cell layer

of cerebellar cortex

high quality

ENSMUSG

00000071424Grid2 UBERON:0

000112

sexually

immature

UBERON:00

02129

cerebellar cortex high quality

ENSMUSG

00000071424Grid2 UBERON:0

000112

sexually

immature

UBERON:00

02979

cerebellum high quality

ENSMUSG

00000071424Grid2 UBERON:0

000112

sexually

immature

UBERON:00

02028

hindbrain high quality

… …

ENSMUSG

00000071424Grid2 UBERON:0

018241

prime

adult

UBERON:00

04720

cerebellar vermis high quality

ENSMUSG

00000071424Grid2 UBERON:0

018241

prime

adult

UBERON:00

04720

cerebellum high quality

… …

Mus_musculus (‘complete’ expression file)

http://bgee.org/?page=download

CURATING A DATABASE OF HOMOLOGY

HYOPTHESES

https://github.com/BgeeDB/anatomical-similarity-annotations

gastrodermis

mouth

choanoderm

osculumhomologous

homologous

Leininger S, Adamski M, …

Adamska M 10.1038/ncomms4905Developmen

tal

Gene expression

evidence

Cnidaria Porifera

ONTOLOGIES FOR DATA

STANDARDIZATION IN SEQUENCING

CONSORTIA

Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D., Chan, E. T., Davidson, J. M., … Hong, E. L. (2015). Ontology application and use at the

ENCODE DCC. Database : The Journal of Biological Databases and Curation, 2015, bav010–. doi:10.1093/database/bav010

Washington, N.L., Stinson, E.O., Perry, M.D. et al. (2011) The modENCODE Data Coordination Center: lessons in harvesting comprehensive

experimental details. Database, 2011, bar023

https://www.encodeproject.org/search/?type=biosample

Monarch Initiative

o Large knowledgebase connecting genes, genotypes and diseases to

phenotypes

o Find novel linkages between human diseases to model systems

o http://monarchinitiative.org

Driving use case

o Given a patient with a rare or unique spectrum of abnormal

phenotypes, determine the causative genomic variant(s)

DISEASES AND ABNORMAL PHENOTYPES

Standard Clinical

Exome

Testing Pipeline

Predicts causative variant based on information in genome of patient and

background genomic data

https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2

Robinson, P., et al . (2013). Improved exome prioritization of

disease genes through cross species phenotype comparison.

Genome Research. doi:10.1101/gr.160325.113

http://monarchinitiative.org/analyze/phenotypes/

EXOMISER USES ONTOLOGY-BASED

PHENOTYPE MATCHING

cleft palate = cleft

(attribute)

palate

(structure)+

SOLVING UNDIAGNOSED

DISEASES

Behavioural/Psychiatric Abnormality

Thyroid stimulating

hormone excess

Gait apraxia

Spasticity

increased exploration in new

environment

increased dopamine level

hyperactivity

hyperactivity

Behavioral

abnormality

Abnormality of

the endocrine

system

abnormal

locomotor

behavior

Abnormal

voluntary

movement

Patient

phenotypes Sh3kbp1 tm1Ivdi -/-

NIH Undiagnosed Disease Program, patient 2731

Think about

o How your data will be re-used by others

o How what your doing will scale

Provide structured metadata for experimental data

o Free text is not enough

o Use ontologies and standardized vocabularies where possible

Failing to do so will cost you later!

o All major human and model organism omics consortia now enforce

this

ENCODE, FANTOM, LINCS

o Also major phenotyping projects

IMPC/KOMP2

LESSONS

Providing metadata requires the right ontologies or

vocabularies in place

Make phenotypic knowledge about your favorite system

structured and computable

o This seems daunting, where do I start…?

LESSONS

Got transcriptome data?

o Bgee will curate it for you!

o Caveat: Your genome must be in Ensembl Genomes

oWe are also interested in your homology hypotheses

Got classic systematics data?

o Talk to me about using Phenoscape infrastructure

BGEE WILL CURATE YOUR

TRANSCRIPTOME DATA

Uberon Core

GOT ANATOMY EXPERTISE? CLAIM AN

INVERTEBRATE MODULE!

Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E.,

Haendel, M. a, & Mungall, C. J. (2014). The Porifera Ontology (PORO):

enhancing sponge systematics with an anatomy ontology. Journal of

Biomedical Semantics, 5(1), 39

Vertebrate

structures

Porifera

Ontology

Ctenophore

Ontology

Cephalopod

Ontology

http://phenotypercn.org

Eric Edsinger, CephSeq

https://github.com/obophenotype/cephalopod-ontology

https://github.com/obophenotype/ctenophore-ontology

https://github.com/obophenotype/porifera-ontology

https://github.com/obophenotype/uberon

Arthropod

Ontology

Noctua

Curation using multiple ontologies with a graph model

oWeb-based, collaborative

oAdvanced GO curation

oPhenotype curation

Beta available in summer 2015

ohttp://noctua.berkeleybop.org

CURATE GENE REGULATORY NETWORKS

AND PHENOTYPES

Structured metadata is valuable

o Helps build the knowledge graph of invertebrate genomics

o Capture metadata up-front, not after the fact

o Use ontologies where possible

o Don’t repeat mistakes of projects that ignored this advice

Invertebrate Ontologies at a nascent stage

o This is an opportunity! Get involved!

CONCLUSIONS

Monarch

oMelissa A Haendel

o Nicole Washington

o Sebastian Kohler

o Harry Hochheiser

oMaryann Martone

o Suzanna Lewis

o Damian Smedley

o Peter Robinson

oWilliam Bone

o Jeremy Nguyen-

Xuan

ACKNOWLEDGMENTS

Uberon

o Frederic Bastian

o Ann Niknejad

oMarc Robinson-

Rechavi

o Todd Vision

o Jim Balhoff

o Paul Sereno

o Nizar Ibrahim

o Alex Dececchi

o Yvonne Bradford

o Terry Hayamizu

o Robert Druzinsky

NSF Phenotype RCN

o Paula Mabee

o Suzanna Lewis

o Eva Huala

o Andy Deans

o Erik Segerdell

o Robert Thacker

o Eric Edsinger

oMatt Yoder

o Istvan Miko

o David Osumi-

Sutherland

Toward synthesizing our knowledge of morphology: using ontologies and machine reasoning to extract presence/absence

evolutionary phenotypes across studies. Dececchi TA et al. https://peerj.com/preprints/807/

FORWARD GENOMICS

http://bejerano.stanford.edu/phenotree/public/html/ Hiller et al. 2012 Cell Reports

top related