on the frontier of genotype-2-phenotype data integration
TRANSCRIPT
On the frontier of genotype-2-phenotype data integration
Melissa Haendel, PhDMarch 22, 2016
AMIA TBI
@monarchinit@[email protected]
Challenge: Each database uses their own phenotype vocabulary/ontology
ZFA
MPDPO
WPO
HP
OMIA
VT
FYPO APOSNOMED
………
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPCOMIM
…QTLd
b
monarchinitiative.org
Can we help machines understand phenotype terms?
“Palmoplantar
hyperkeratosis”
Human phenotype I have absolutely no
idea what that means
???
monarchinitiative.org
Genotype-phenotype integration
One sourceTwo sources3 or more
9%
91% of our 2.2 Million G2P associations required integrating 2 or more data sources (this number does not even include orthology (Panther) or any ontologies!)
91%
What’s in a Phenopacket?Ontology-based phenotypic descriptions for: Human patients, model organisms, or any organism Groupings of human patients or organisms
What does it include? age of patient or organism sex of patient or organism disease (if named) age of onset of disease Positive and negative phenotype associations Reference to Genes, variants, or collections of variants Reference to environmental factors
Multiple formats: TSV, JSON, YAML, JSONValidation toolsUses standardized publication citation mechanism for data sharing
brca-website.cloudapp.net
13501 variants from ENIGMA, ClinVar, LOVD, exLOVD, BIC
Merged by genomic coordinate and alternate allele string
Problems with evidence and provenance of G2P Associations
PROBLEMS: Variants have different pathogenicity calls due to annotation inconsistency AND different experimental evidence
Incomplete, not computable, and frequently conflated
Annotations are to different aspects of the genotype: allele, variant, gene, transcript, etc.
A computable model would enable: context to evaluate credibility/confidence support filtering and analysis of data detailed history for attribution
Building a computable model for ACMG guidelines
http://brcaexchange.org/
Provenance Evidence Claim
- Materials & methods - Agent(s) of evidence - Agent(s) of claim - Time and place
- Data (eg: images, sequences) - Evidence codes - Publications - Confidence (p-val, z-score) - Summary figures - Conclusions from previous studies - Domain expert’s knowledge
Causal relationships, hypothesized relationships, correlations etc.
https://github.com/monarch-initiative/SEPIO-ontology
Summary
Ontologies can be used to perform deep phenotyping integration across species
An exchange standard is needed to facilitate distributed phenotype data sharing
A computable G2P evidence model can aid variant interpretation
AcknowledgementsLawrence Berkeley
Chris MungallNicole WashingtonSuzanna LewisJeremy NguyenSeth Carbon
CharitéPeter RobinsonSebastian Kohler
U of PittsburghHarry HochheiserMike DavisJoe Zhou
OHSUNicole VasileskyMatt BrushKent ShefchekJulie McMurryTom Conlin
Genomics EnglandDamian SmedleyJules Jacobson
UCSCDavid HausslerBenedict PatenMark DiekhansMelissa Cline
GarvanTudor GrozaCraig McNamaraEdwin Zhang
FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C, HHSN268201400093P; NCINCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)