giga2 structuring phenotype data
TRANSCRIPT
GIGA2, Munich, March 2015
STRUCTURING
PHENOTYPE DATA:
Chris
Mungall
LBNL,
Berkeley
Gene
Ontology
Lessons from vertebrate
genomes
Web Apollo: http://genomearchitect.org
Desvignes, T., Pontarotti, P., & Bobe, J. (2010).
Nme gene family evolutionary history reveals pre-
metazoan origins and high conservation between
humans and the sea anemone, nematostella
vectensis. PLoS ONE, 5(11).
doi:10.1371/journal.pone.0015506
Genome
structures are
highly
amenable to
comparison
Can we compute over the architecture of phenomes as we do
for genome architecture?
oWhat genes affect distal appendage length or shape?
oWhat are the genes expressed in the mouth during development?
oWhat structures develop using the same gene regulatory networks as
in bilaterian mouths?
Current methods
o Text based search of literature and manually gather results
Time consuming
Hard to automate
COMPUTING OVER PHENOTYPES
Ge
ne
Every phenotype ever to have existed
expressed
in mouth
Affects appendage length
regulates EMT …
PHENOTYPES: ENDLESS FORMS
Pe
yto
ian
ath
ors
tiA
mp
hip
ho
lis
sq
ua
ma
taP
etr
om
yzo
nm
ari
nu
s
Bu
gu
la
Ho
mo
sa
pie
ns
(wit
h c
left
pa
late
)
Myste
ce
tiA
ply
sin
aa
ero
ph
ob
aG
astr
ula
(M
eta
zoa
n)
mouth anusosculum
blastopore
cleft
lip and
palate
Ge
ne
“expressed
in mouth”
“affects appendage length”
“long tentacles”
“elongated arms”
FREE TEXT != STRUCTURED
“expressed
around oral
opening”
“expressed
in anterior
end of gut
tube”
ONTOLOGIES: STRUCTURING A DIVERSITY
OF PHENOTYPES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
https://github.com/obophenotype/cephalopod-ontology
mouthsurrounds
ONTOLOGIES FOR MOLECULAR
PHENOTYPES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
Scr Lox5 Antp
Expressed in
mouthsurrounds
GRAPH KNOWLEDGE QUERIES
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
Scr Lox5 Antp
Expressed in
mouthsurrounds
“What genes
Are expressed in
structures that develop from
a tentacle bud, or homologs?”
ONTOLOGIES FOR TRAITS
tentacle
tentacular
bud
circumoral
appendage
tentacular
club sucker
arm
develops
into
is a subtype of
Is part of
homologous
arm IV
mouthsurrounds
shape length++
=shape of
tentacular club
=length of
arm IV
Wild-type phenotypic function:
o The Gene Ontology
Anatomy:
o Uberon anatomy ontology
APPLICATIONS OF ONTOLOGIES
For curating the ‘wild type functional phenotypes’
Genes for over 0.5 million species have associations to GO
terms
>40,000 terms
oMolecular function
o Cellular component
o Biological Process
Core and taxon-specific
Uses include
o Gene set selection
o Term enrichment
THE GENE ONTOLOGY
Gene Ontology: tool for the unification of biology: Ashburner et al. Nature Genetics 25, 25 - 29 (2000)
http://geneontology.org
Experimental
o Curated from literature
Automated methods:
o Based on sequence similarity
E.g. blast2go
o Based on protein features
Interpro2GO
o Based on phylogenetic evidence
Ensembl COMPARA
Panther Families and PAINT
Typically only applied for
conserved cellular biology
ASSIGNING GENE FUNCTION
Gaudet, P., et al. (2011). Phylogenetic-based propagation of functional annotations within the Gene Ontology consortium.
Briefings in Bioinformatics, 12(5), 449–62. doi:10.1093/bib/bbr042
PAINT
EXTRACTING GENE LISTS AND
INTERPRETING TRANSCRIPTOMIC DATA
Wang, Z., Pascual-Anaya, J., Zadissa, A., Li, W., Niimura, Y., Huang,
Z., … Irie, N. (2013). The draft genomes of soft-shell turtle and
green sea turtle yield insights into the development and evolution
of the turtle-specific body plan. Nature Genetics, 45(6), 701–6.
doi:10.1038/ng.2615
BEYOND THE GO
Functional
Genomics: Gene
function
Transcriptomics:
Gene expression
Phenomics: Effects
of gene mutations
Gene Ontology
Anatomy and Stage
Ontology
Phenotype and Trait
Ontology
Links genes to
What they do
Links genes to
where they
are expressed
Links genes to
what happens
when they are
disrupted
Core: 14,000 terms
o Bias towards vertebrate systems
Composite-Metazoan edition: 42,000 terms
o Integrates cell types, developmental stages,
o Species-specific ontologies
Uses
o Standard reference for animal anatomy
o Linking model organism databases
o Evolutionary systematics (Phenoscape)
o Comparative transcriptomics (Bgee)
o Standardized vocabulary for mammalian
sequencing consortia
o Cross-species phenotype matching (Monarch)
THE UBERON MULTI-SPECIES
COMPARATIVE ANATOMY ONTOLOGY
http://uberon.org
Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species
anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
PHENOSCAPE: LINKING EVOLUTION TO
GENOMICS USING PHENOTYPE ONTOLOGIES
Phenotypic knowledgebase
o Linking phenotypes to extant and extinct vertebrate taxa
o Integrate with model organism databases
Extending Uberon to cover diversity of vertebrates
Haendel, MA, Balhoff JP, ..., Sereno, PC., Mungall, C.J (2014).
Unification of multi-species vertebrate anatomy ontologies for
comparative biology in Uberon. Journal of Biomedical Semantics,
5(1), 21. doi:10.1186/2041-1480-5-21
UBERON FOR COMPARATIVE GENE
EXPRESSION
EXAMPLE OF EXPRESSION DATA
Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence
ENSMUSG
00000071424Grid2 UBERON:00
00112
sexually
immature
UBERON:00
02979
Purkinje cell
layer of
cerebellar
cortex
high quality
ENSMUSG
00000071424Grid2 UBERON:00
18241
prime adult UBERON:00
04720
cerebellar
vermis
high quality
Mus_musculus (‘simple’ expression file)
http://bgee.org/?page=download
EXAMPLE OF INFERRED EXPRESSION
DATA
Ensembl ID Gene Stage ID Stage Anatomy ID Anatomy Evidence
ENSMUSG
00000071424Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02979
Purkinje cell layer
of cerebellar cortex
high quality
ENSMUSG
00000071424Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02129
cerebellar cortex high quality
ENSMUSG
00000071424Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02979
cerebellum high quality
ENSMUSG
00000071424Grid2 UBERON:0
000112
sexually
immature
UBERON:00
02028
hindbrain high quality
… …
ENSMUSG
00000071424Grid2 UBERON:0
018241
prime
adult
UBERON:00
04720
cerebellar vermis high quality
ENSMUSG
00000071424Grid2 UBERON:0
018241
prime
adult
UBERON:00
04720
cerebellum high quality
… …
Mus_musculus (‘complete’ expression file)
http://bgee.org/?page=download
CURATING A DATABASE OF HOMOLOGY
HYOPTHESES
https://github.com/BgeeDB/anatomical-similarity-annotations
gastrodermis
mouth
choanoderm
osculumhomologous
homologous
Leininger S, Adamski M, …
Adamska M 10.1038/ncomms4905Developmen
tal
Gene expression
evidence
Cnidaria Porifera
ONTOLOGIES FOR DATA
STANDARDIZATION IN SEQUENCING
CONSORTIA
Malladi, V. S., Erickson, D. T., Podduturi, N. R., Rowe, L. D., Chan, E. T., Davidson, J. M., … Hong, E. L. (2015). Ontology application and use at the
ENCODE DCC. Database : The Journal of Biological Databases and Curation, 2015, bav010–. doi:10.1093/database/bav010
Washington, N.L., Stinson, E.O., Perry, M.D. et al. (2011) The modENCODE Data Coordination Center: lessons in harvesting comprehensive
experimental details. Database, 2011, bar023
https://www.encodeproject.org/search/?type=biosample
Monarch Initiative
o Large knowledgebase connecting genes, genotypes and diseases to
phenotypes
o Find novel linkages between human diseases to model systems
o http://monarchinitiative.org
Driving use case
o Given a patient with a rare or unique spectrum of abnormal
phenotypes, determine the causative genomic variant(s)
DISEASES AND ABNORMAL PHENOTYPES
Standard Clinical
Exome
Testing Pipeline
Predicts causative variant based on information in genome of patient and
background genomic data
https://www.sanger.ac.uk/resources/databases/exomiser/query/exomiser2
Robinson, P., et al . (2013). Improved exome prioritization of
disease genes through cross species phenotype comparison.
Genome Research. doi:10.1101/gr.160325.113
http://monarchinitiative.org/analyze/phenotypes/
EXOMISER USES ONTOLOGY-BASED
PHENOTYPE MATCHING
cleft palate = cleft
(attribute)
palate
(structure)+
SOLVING UNDIAGNOSED
DISEASES
Behavioural/Psychiatric Abnormality
Thyroid stimulating
hormone excess
Gait apraxia
Spasticity
increased exploration in new
environment
increased dopamine level
hyperactivity
hyperactivity
Behavioral
abnormality
Abnormality of
the endocrine
system
abnormal
locomotor
behavior
Abnormal
voluntary
movement
Patient
phenotypes Sh3kbp1 tm1Ivdi -/-
NIH Undiagnosed Disease Program, patient 2731
Think about
o How your data will be re-used by others
o How what your doing will scale
Provide structured metadata for experimental data
o Free text is not enough
o Use ontologies and standardized vocabularies where possible
Failing to do so will cost you later!
o All major human and model organism omics consortia now enforce
this
ENCODE, FANTOM, LINCS
o Also major phenotyping projects
IMPC/KOMP2
LESSONS
Providing metadata requires the right ontologies or
vocabularies in place
Make phenotypic knowledge about your favorite system
structured and computable
o This seems daunting, where do I start…?
LESSONS
Got transcriptome data?
o Bgee will curate it for you!
o Caveat: Your genome must be in Ensembl Genomes
oWe are also interested in your homology hypotheses
Got classic systematics data?
o Talk to me about using Phenoscape infrastructure
BGEE WILL CURATE YOUR
TRANSCRIPTOME DATA
Uberon Core
GOT ANATOMY EXPERTISE? CLAIM AN
INVERTEBRATE MODULE!
Thacker, R. W., Díaz, M. C., Kerner, A., Vignes-Lebbe, R., Segerdell, E.,
Haendel, M. a, & Mungall, C. J. (2014). The Porifera Ontology (PORO):
enhancing sponge systematics with an anatomy ontology. Journal of
Biomedical Semantics, 5(1), 39
Vertebrate
structures
Porifera
Ontology
Ctenophore
Ontology
Cephalopod
Ontology
http://phenotypercn.org
Eric Edsinger, CephSeq
https://github.com/obophenotype/cephalopod-ontology
https://github.com/obophenotype/ctenophore-ontology
https://github.com/obophenotype/porifera-ontology
https://github.com/obophenotype/uberon
Arthropod
Ontology
Noctua
Curation using multiple ontologies with a graph model
oWeb-based, collaborative
oAdvanced GO curation
oPhenotype curation
Beta available in summer 2015
ohttp://noctua.berkeleybop.org
CURATE GENE REGULATORY NETWORKS
AND PHENOTYPES
Structured metadata is valuable
o Helps build the knowledge graph of invertebrate genomics
o Capture metadata up-front, not after the fact
o Use ontologies where possible
o Don’t repeat mistakes of projects that ignored this advice
Invertebrate Ontologies at a nascent stage
o This is an opportunity! Get involved!
CONCLUSIONS
Monarch
oMelissa A Haendel
o Nicole Washington
o Sebastian Kohler
o Harry Hochheiser
oMaryann Martone
o Suzanna Lewis
o Damian Smedley
o Peter Robinson
oWilliam Bone
o Jeremy Nguyen-
Xuan
ACKNOWLEDGMENTS
Uberon
o Frederic Bastian
o Ann Niknejad
oMarc Robinson-
Rechavi
o Todd Vision
o Jim Balhoff
o Paul Sereno
o Nizar Ibrahim
o Alex Dececchi
o Yvonne Bradford
o Terry Hayamizu
o Robert Druzinsky
NSF Phenotype RCN
o Paula Mabee
o Suzanna Lewis
o Eva Huala
o Andy Deans
o Erik Segerdell
o Robert Thacker
o Eric Edsinger
oMatt Yoder
o Istvan Miko
o David Osumi-
Sutherland
Toward synthesizing our knowledge of morphology: using ontologies and machine reasoning to extract presence/absence
evolutionary phenotypes across studies. Dececchi TA et al. https://peerj.com/preprints/807/
FORWARD GENOMICS
http://bejerano.stanford.edu/phenotree/public/html/ Hiller et al. 2012 Cell Reports