on the frontier of genotype-2-phenotype data integration

17
On the frontier of genotype-2- phenotype data integration Melissa Haendel, PhD March 22, 2016 AMIA TBI @monarchinit @ontowonka [email protected]

Upload: mhaendel

Post on 09-Jan-2017

746 views

Category:

Science


0 download

TRANSCRIPT

On the frontier of genotype-2-phenotype data integration

Melissa Haendel, PhDMarch 22, 2016

AMIA TBI

@monarchinit@[email protected]

Filling the G2P knowledge gap from other organisms

Other= rat, fly, worm, mouse, zebrafish

monarchinitiative.org

Ulcerated paws

Palmoplantar hyperkeratos

is

Thick hand skin

Challenge: Each database uses their own vocabulary/ontology

MPHP

MGIHPOA

Challenge: Each database uses their own phenotype vocabulary/ontology

ZFA

MPDPO

WPO

HP

OMIA

VT

FYPO APOSNOMED

………

WB

PB

FB

OMIA

MGI

RGD

ZFIN

SGD

HPOA

EHR

IMPCOMIM

…QTLd

b

monarchinitiative.org

Can we help machines understand phenotype terms?

“Palmoplantar

hyperkeratosis”

Human phenotype I have absolutely no

idea what that means

???

The Human Phenotype Ontology

monarchinitiative.org

Genotype-phenotype integration

One sourceTwo sources3 or more

9%

91% of our 2.2 Million G2P associations required integrating 2 or more data sources (this number does not even include orthology (Panther) or any ontologies!)

91%

Diagnosing an undiagnosed disease

www.owlsim.org

Phenotype Exchange Standard

www.phenopackets.org

What’s in a Phenopacket?Ontology-based phenotypic descriptions for: Human patients, model organisms, or any organism Groupings of human patients or organisms

What does it include? age of patient or organism sex of patient or organism disease (if named) age of onset of disease Positive and negative phenotype associations Reference to Genes, variants, or collections of variants Reference to environmental factors

Multiple formats: TSV, JSON, YAML, JSONValidation toolsUses standardized publication citation mechanism for data sharing

brca-website.cloudapp.net

13501 variants from ENIGMA, ClinVar, LOVD, exLOVD, BIC

Merged by genomic coordinate and alternate allele string

Problems with evidence and provenance of G2P Associations

PROBLEMS: Variants have different pathogenicity calls due to annotation inconsistency AND different experimental evidence

Incomplete, not computable, and frequently conflated

Annotations are to different aspects of the genotype: allele, variant, gene, transcript, etc.

A computable model would enable: context to evaluate credibility/confidence support filtering and analysis of data detailed history for attribution

Building a computable model for ACMG guidelines

http://brcaexchange.org/

Provenance Evidence Claim

- Materials & methods - Agent(s) of evidence - Agent(s) of claim - Time and place

- Data (eg: images, sequences) - Evidence codes - Publications - Confidence (p-val, z-score) - Summary figures - Conclusions from previous studies - Domain expert’s knowledge

Causal relationships, hypothesized relationships, correlations etc.

https://github.com/monarch-initiative/SEPIO-ontology

Summary

Ontologies can be used to perform deep phenotyping integration across species

An exchange standard is needed to facilitate distributed phenotype data sharing

A computable G2P evidence model can aid variant interpretation

AcknowledgementsLawrence Berkeley

Chris MungallNicole WashingtonSuzanna LewisJeremy NguyenSeth Carbon

CharitéPeter RobinsonSebastian Kohler

U of PittsburghHarry HochheiserMike DavisJoe Zhou

OHSUNicole VasileskyMatt BrushKent ShefchekJulie McMurryTom Conlin

Genomics EnglandDamian SmedleyJules Jacobson

UCSCDavid HausslerBenedict PatenMark DiekhansMelissa Cline

GarvanTudor GrozaCraig McNamaraEdwin Zhang

FUNDING: NIH Office of Director: 1R24OD011883; NIH-UDP: HHSN268201300036C, HHSN268201400093P; NCINCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)