the monarch initiative: from model organism to precision medicine

1
??? Monarch is supported generously by: a NIH Office of the Director Grant #5R24OD011883 as well as by NCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman) [email protected] @monarchinit The Problem: Human genome is poorly annotated A better understanding of human gene function and disease mechanisms is critical for diagnosis, precision medicine, and targeted therapies The Approach: Monarch cross-species G2P Integration Pipeline Ontologies Data Standards Curation and Data Modeling Algorithms Tools The Solution: Leverage all the species data Solve the cross-species language divide www.monarchinitiative.org/sources Acknowledgements and Contact Info Palmoplantar hyperkeratosis Thick hand skin Ulcerated paws MONARCH TEAM MAINTAINS MONARCH TEAM CONTRIBUTES LEGEND Data source Ontology Bridging Ontology PHENOTYPES DISEASES MODEL ORGNISM HUMAN Community Ontology Term Phenotype ANATOMY ClinVar Coriell CTD Elem of Morph Gene Reviews GWAS HPOA OMIMdb Orphanet KEGG AnimalQTLDB FlyBase IMPC MGI MPD OMIA RGD WormBase ZFIN MeSH MedGen OMIM HP EFO ORDO VT FBcv ZP WP MP MONDO UPheno MA ZFA UBERON FBbt WA CL EMAPA MODEL ORGNISM HUMAN PROBLEM Phenotypic language differs by organism and also by community, thus impeding integration SOLUTION SOLUTION Monarch integrates the data sources through bridging ontologies PROBLEM SOLUTION PROBLEM SOLUTION SOLUTION SOLUTION SOLUTION SOLUTION The phenotypes are associated with very different aspects of the genotype in each data source. The Challenge: Fragmented, heterogeneous G2P data Mus mgd mgd mmrrc mmrrc mgi mgi animalqtldb animalqtldb Homo cgd cgd clinvar clinvar gwascatalog gwascatalog hpoa hpoa kegg kegg omim omim orphanet orphanet coriell coriell omia omia monarch monarch-curated Canis Macaca Panthera Equus Ovis Danio zfin zfin Gallus Sula Vulpes Anas Coturnix Peromyscus Tragelaphus other >100 SPECIES Bos Sus 0% 40% 60% 80% 100% Human only Human + other 20% The phenotypic consequences of mutation for the human coding genome are <20%; inclusion of orthologs from other species boosts this number to over 80% We learn about different phenotypes from different species, and want to use all this data Improve data quality and interoperability Evidence and provenance for G2P associations is incomplete, not computable, and frequently conflated. This hampers integration and pathogenicity determination. Disentangle these concepts, and model data to make it computable. PROBLEMS SOLUTIONS https://mme.monarchinitiative.org github.com/ga4gh/schemas Diagnosing rare diseases requires identifying similar patients and models Monarch integrated cross-species data available on pa- tient matchmaker exchange. Data models for modeling any bio- logical database source expecially G2P sources are highly heterogene- ous. Data are insufficiently described to understand what they are or how they were produced. Monarch integrated cross- species data available on patient matchmaker exchange Monarch is contributing GA4GH Schemas to bridge the heterogeneous G2P sources HCLS provides a guide to indicate what are the essential metadata, and how to express it. Monarch was a key contributor toward this community effort and is testing the model for all sources in its corpus Compute over diseases, phenotypes, modes to diagnose diseases PhenoGrid http://www.sanger.ac.uk/science/tools/exomiser http://patientarchive.org/ Exomiser https://www.npmjs.com/package/phenogrid Whole exome Remove off-target and common variants Variant score from allele freq and pathogenicity Phenotype score from phenotypic similarity PHIVE score to give final candidates Mendelian filters Combine genotype and phenotype data for variant prioritization Visualize phenotype profile comparisons Between patients and... - Other patients - Known diseases - Models Embeddable 3rd party widget for data resources PhenoTua / Noctua Uniquely identify a model or disease Check organism/genotype nomenclature Choose terms from any phenotype ontology Provide evidence Edit collaboratively, group sharing View in two modalities: - Ontology smart spreadsheet - Graphical Causal Networks HPO Pubmed Browser Curate causal networks between genes, genotypes, phenotypes, diseases, using organism-agnostic standardized owl models http://create.monarchinitiative.org/ Check Annotation Sufficiency Automated extraction of Human Phenotype Ontology concepts from free text clinical summaries. Intuitive visualization of patient phenotype profiles and diagnoses. Immediate visual feed-back on phenotype profiles using the Monarch annotation sufficiency score. Fine-grained patient sharing access control. Encrypted patient sensitive data - yet with the possibility of searching over this data. Visualize and Browse Relationships Finding literature relevant to a set of phenotypes should be easy. http://pubmed-browser.human-phenotype-ontology.org/ Zemojtel, T. et al. Effective diagnosis of genetic disease by computation- al phenotype analysis of the disease-associated genome. Science Trans- lational Medicine Vol. 6, Issue 252, pp. 252ra123 (11 diagnosed fami- lies) Pippucci, T. et al. A novel null homozygous mutation confirms CAC- NA2D2 as a gene mutated in epileptic encephalopathy. PLoS One 8, e82154 (2013). (1 diagnosed family) Requena, T. et al. Identification of two novel mutations in FAM136A and DTNA genes in autosomal-dominant familial Meniereʼs disease. Human Molecular Genetics. 24, 1119–26 (2015). (2 diagnosed families) Bone, W. et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genetics in Medicine. In press (2015). doi:10.1038/gim.2015.137 (4 diagnosed families) 18 Published Diagnoses www.monarchinitiative.org www.owlsim.org Patient X Disease Y Model Z Make causal relationships computable: Improve modeling of evidence and provenance owlsim http://brcaexchange.org/ Providence Evidence Claim - Data (eg: images, sequences) - Evidence codes - Publications - Statistical confidence (p-val, z-score) - Summary figures - Conclusions from previous studies - Tacit knowledge of a domain expert - types of assay/technique/study or instances thereof - agent(s) who produced evidence - agent(s) who asserted the claim - time and place - materials (e.g. models systems, reagents, instruments) Process history Key participants in process Outputs of process http://tinyurl.com/brca-g2p http://tinyurl.com/acmg-guidelines - Causal relationships, hypothesized relationships, coorelations etc. Fuzzy matching between patients, phenotypes, and diseases Problem: It is difficult to prioritize candidate genes for diagnosis, or identifying model that best capitulates a disease Compute similarity of phenotypic profiles Graph-based semantic similarity PROBLEM SOLUTION Researchers donʼt know when their phenotyping is sufficient to be useful beyond their specialized community Clinicians donʼt know when their phe- notyping is sufficient for diagnosis Compare patient or organism phenotypic profile against all known diesases and genotypes. Get feedback in real time. http://tinyurl.com/phenotypesufficiency https://monarchinitiative.org/page/services patient archive ? ? ? ? ? patient archive PROBLEMS SOLUTIONS Problems with identifier design and provision result in link rot and content drift therefore com- promising the flow and integrity of information. Identifiers must resolve, and when referenced in the same context must not collide. Prefixes play a critical role in these two goals; however, due to confusion and inconsistency about prefixes, a single identifier can be referenced multiple differ- ent ways: 12345, MGI:12345, MGI:MGI:12345, MGI:MGI_12345, thus complicating determina- tions of equivalence and data integration. Moreover prefixes used in the same context can conflict (eg. GEO). Monarch is a key contributor to identifier standards for big data integration 10 Simple Rules for Design and Provision of Life Science Database Identifiers for the Web Monarch is leading a community effort to coordinate prefixes between the eight active prefix registries JDDCP prefix commons zenodo.org/record/31765 github.com/prefixcommons health care & life sciences w3.org/TR/hcls-dataset/ MENDELIAN DISEASES 3,462 OMIM ? 47,964 VARIANTS CLINVAR with no known genetic basis with no known diseases 1 Oregon Health & Sciences University; Portland, OR • 2 Lawrence Berkeley National Lab, Berkeley, CA • 3 University of Pittsburgh, Pittsburgh, PA • 4 University of California San Diego, San Diego, CA • 5 Garvan Institute, Sydney, Australia • 6 Sanger Center, Hinxton, UK • 7 Charite From Model Mechanism to Precision Medicine: an Open Science Integrated Genotype-Phenotype Platform Nicole Vasilevsky1, Nicole Washington2, Chuck Borromeo3, Matthew Brush1, Seth Carbon2, Michael Davis3, Nathan Dunn2, Mark Englestad1, Jeremy Espino3, Shahim Essaid1, Jeffrey Grethe4, Tudor Groza5, Harry Hochheiser3, Sebastian Köhler6, Suzanna Lewis2, Julie McMurry1, Craig McNamara5, Chris Mungall2, Jeremy Nguyen Xuan2, Peter Robinson7, Kent Shefchek1, Damian Smedley6, Zhou Yuan3, Edwin Zhang5, Melissa Haendel1, Human Disease: HADZISELIMOVIC SYNDROME mouse model: b2b1035Clo (aka Blue Meanie) tricuspid valve atresia MP:0006123 prenatal growth retardation MP:0010865 persistent truncus arteriosis MP:0002633 cleft palate MP:0000111 1 V entricular hypertrophy HP:0001714 High-arched palate HP:0000156 Failure to thrive HP:0001508 Pulmonary artery atresia HP:0004935 Renal hypoplasia HP:0000089 abnormal kidney morphology abnormal palate morphology growth defi ciency Malformation of the heart and great vessels abnormal heart and great artery attachment duplex kidney MP:0004017 common (UPheno)

Upload: mhaendel

Post on 13-Apr-2017

766 views

Category:

Health & Medicine


1 download

TRANSCRIPT

Page 1: The Monarch Initiative: From Model Organism to Precision Medicine

???

Monarch is supported generously by: a NIH Office of the Director Grant #5R24OD011883 as well as byNCI/Leidos #15X143, BD2K U54HG007990-S2 (Haussler) & BD2K PA-15-144-U01 (Kesselman)

[email protected] @monarchinit

The Problem: Human genome is poorly annotated

A better understanding of human gene function and disease mechanisms is critical for diagnosis, precision medicine, and targeted therapies

The Approach: Monarch cross-speciesG2P Integration Pipeline

Ontologies Data Standards Curation andData Modeling

Algorithms Tools

The Solution: Leverage all the species data

Solve the cross-species language divide

www.monarchinitiative.org/sources

Acknowledgements and Contact Info

Palmoplantarhyperkeratosis

Thick hand skin

Ulcerated paws

MONARCH TEAM MAINTAINS

MONARCH TEAM CONTRIBUTESLEGEND

Data source OntologyBridgingOntology

PHEN

OTY

PES

DIS

EASE

S

MO

DEL

O

RGN

ISM

HU

MA

N

Community Ontology Term Phenotype

AN

ATO

MY

ClinVar

Coriell

CTD

Elem of Morph

Gene Reviews

GWAS

HPOAOMIMdb

Orphanet

KEGG

AnimalQTLDB

FlyBase

IMPC

MGI

MPD

OMIA

RGDWormBase

ZFIN

MeSHMedGen

OMIM

HPEFO

ORDO

VT

FBcv

ZP

WPMP

MONDO

UPheno

MA

ZFA

UBERON

FBbt

WACL

EMAPAMO

DEL

O

RGN

ISM

HU

MA

N

PROBLEM

Phenotypic language differs by organism and also by community, thus impeding integration

SOLUTION SOLUTIONMonarch integrates the data sources through bridging ontologies

PROBLEM

SOLUTION

PROBLEM

SOLUTION

SOLUTION

SOLUTION

SOLUTION

SOLUTION

The phenotypes are associated with very different aspects of the genotype in each data source.

The Challenge: Fragmented, heterogeneous G2P data

Musmgdmgd

mmrrcmmrrcmgimgi

animalqtldbanimalqtldb

Homo

cgdcgdclinvarclinvar

gwascataloggwascataloghpoahpoakeggkeggomimomim

orphanetorphanet

coriellcoriell

omiaomia

monarchmonarch-curated

Canis

Macaca

Panthera

Equus

Ovis

Danio

zfinzfin

Gallus

Sula

Vulpes

AnasCoturnixPeromyscus

Tragelaphus

other

>100SPECIES

Bos

Sus

0%

40%

60%

80%

100%

Humanonly

Human +other

20%

The phenotypic consequences of mutation for the human coding genome are <20%; inclusion of orthologs from other species boosts this number to over 80%

We learn about different phenotypes from different species, and want to useall this data

Improve data quality and interoperability

Evidence and provenance for G2P associations is incomplete, not computable, and frequently conflated. This hampers integration and pathogenicity determination.

Disentangle these concepts, and model data to make it computable.

PROBLEMS SOLUTIONS

https://mme.monarchinitiative.org

github.com/ga4gh/schemas

Diagnosing rare diseases requires identifying similar patients and models Monarch integrated cross-species data available on pa-tient matchmaker exchange.

Data models for modeling any bio-logical database source expecially G2P sources are highly heterogene-ous.

Data are insufficiently described to understand what they are or how they were produced.

Monarch integrated cross-species data available on patient matchmaker exchange

Monarch is contributing GA4GH Schemas to bridge the heterogeneous G2P sources

HCLS provides a guide to indicate what are the essential metadata, and how to express it. Monarch was a key contributor toward this community effort and is testing the model for all sources in its corpus

Compute over diseases, phenotypes, modesto diagnose diseases

PhenoGrid

http://www.sanger.ac.uk/science/tools/exomiser

http://patientarchive.org/

Exomiser

https://www.npmjs.com/package/phenogrid

Whole exome

Remove off-target and common variants

Variant score from allele freq and pathogenicity

Phenotype score from phenotypic similarity

PHIVE score to give final candidates

Mendelian filters

Combine genotype and phenotype data for variant prioritization

Visualize phenotype profile comparisonsBetween patients and... - Other patients - Known diseases - Models

Embeddable 3rd party widget for data resources

PhenoTua / Noctua

Uniquely identify a model or disease

Check organism/genotype nomenclature

Choose terms from any phenotype ontology

Provide evidence

Edit collaboratively, group sharing

View in two modalities: - Ontology smart spreadsheet - Graphical Causal Networks

HPO Pubmed Browser

Curate causal networks between genes, genotypes, phenotypes, diseases, using organism-agnostic standardized owl models

http://create.monarchinitiative.org/

Check Annotation Sufficiency

Automated extraction of Human Phenotype Ontology concepts from free text clinical summaries.Intuitive visualization of patient phenotype profiles and diagnoses.Immediate visual feed-back on phenotype profiles using the Monarch annotation sufficiency score.Fine-grained patient sharing access control.Encrypted patient sensitive data - yet with the possibility of searching over this data.

Visualize and Browse Relationships

Finding literature relevant to a set of phenotypesshould be easy.

http://pubmed-browser.human-phenotype-ontology.org/

Zemojtel, T. et al. Effective diagnosis of genetic disease by computation-al phenotype analysis of the disease-associated genome. Science Trans-lational Medicine Vol. 6, Issue 252, pp. 252ra123 (11 diagnosed fami-lies)

Pippucci, T. et al. A novel null homozygous mutation confirms CAC-NA2D2 as a gene mutated in epileptic encephalopathy. PLoS One 8, e82154 (2013). (1 diagnosed family)

Requena, T. et al. Identification of two novel mutations in FAM136A and DTNA genes in autosomal-dominant familial Meniereʼs disease. Human Molecular Genetics. 24, 1119–26 (2015). (2 diagnosed families)

Bone, W. et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genetics in Medicine. In press (2015). doi:10.1038/gim.2015.137 (4 diagnosed families)

18 P

ublis

hed

Dia

gnos

es

www.monarchinitiative.org

www.owlsim.org

Patient X

Disease Y Model Z

Make causal relationships computable:Improve modeling of evidence and provenance

owlsim

http://brcaexchange.org/

Providence Evidence Claim

- Data (eg: images, sequences) - Evidence codes - Publications - Statistical confidence (p-val, z-score) - Summary figures - Conclusions from previous studies - Tacit knowledge of a domain expert

- types of assay/technique/study or instances thereof - agent(s) who produced evidence - agent(s) who asserted the claim - time and place - materials (e.g. models systems, reagents, instruments)

Process historyKey participants in process Outputs of process

http://tinyurl.com/brca-g2phttp://tinyurl.com/acmg-guidelines

- Causal relationships, hypothesizedrelationships, coorelations etc.

Fuzzy matching between patients, phenotypes, and diseases

Problem: It is difficult to prioritize candidate genes for diagnosis, or identifying model that best capitulates a disease

Compute similarity of phenotypic profiles

Graph-based semantic similarity

PROBLEM SOLUTION

Researchers donʼt know when their phenotyping is sufficient to be useful beyond their specialized community

Clinicians donʼt know when their phe-notyping is sufficient for diagnosis

Compare patient or organism phenotypic profile against all known diesases and genotypes. Get feedback in real time.

http://tinyurl.com/phenotypesufficiency

https://monarchinitiative.org/page/services

patientarchive

? ? ? ? ?

patientarchive

PROBLEMS SOLUTIONS

Problems with identifier design and provision result in link rot and content drift therefore com-promising the flow and integrity of information.

Identifiers must resolve, and when referenced in the same context must not collide. Prefixes play a critical role in these two goals; however, due to confusion and inconsistency about prefixes, a single identifier can be referenced multiple differ-ent ways: 12345, MGI:12345, MGI:MGI:12345, MGI:MGI_12345, thus complicating determina-tions of equivalence and data integration.

Moreover prefixes used in the same context can conflict (eg. GEO).

Monarch is a key contributor to identifier standards for big data integration

10 Simple Rules for Design and Provision of Life Science Database Identifiers for the Web

Monarch is leading a community effort to coordinate prefixes between the eight active prefix registries

JDDCP

prefix commons

zenodo.org/record/31765

github.com/prefixcommons

health care &life sciences

w3.org/TR/hcls-dataset/

MENDELIAN DISEASES

3,462OMIM ?

47,964VARIANTS

CLINVAR

with no known genetic basis with no known diseases

1 Oregon Health & Sciences University; Portland, OR • 2 Lawrence Berkeley National Lab, Berkeley, CA • 3 University of Pittsburgh, Pittsburgh, PA • 4 University of California San Diego, San Diego, CA • 5 Garvan Institute, Sydney, Australia • 6 Sanger Center, Hinxton, UK • 7 Charite

From Model Mechanism to Precision Medicine:an Open Science Integrated Genotype-Phenotype Platform

Nicole Vasilevsky1, Nicole Washington2, Chuck Borromeo3, Matthew Brush1, Seth Carbon2, Michael Davis3, Nathan Dunn2, Mark Englestad1, Jeremy Espino3, Shahim Essaid1, Jeffrey Grethe4, Tudor Groza5, Harry Hochheiser3, Sebastian Köhler6, Suzanna Lewis2, Julie McMurry1, Craig McNamara5, Chris Mungall2, Jeremy Nguyen Xuan2, Peter Robinson7, Kent Shefchek1, Damian Smedley6, Zhou Yuan3, Edwin Zhang5, Melissa Haendel1,

Human Disease: HADZISELIMOVIC

SYNDROME

mouse model:b2b1035Clo

(aka Blue Meanie)

tricuspid valve atresiaMP:0006123

prenatal growth retardation

MP:0010865

persistent truncus arteriosis

MP:0002633

cleft palateMP:0000111 1

Ventricular hypertrophy

HP:0001714

High-arched palate

HP:0000156

Failure to thrive HP:0001508

Pulmonary artery atresia

HP:0004935

Renal hypoplasia

HP:0000089

abnormal kidney

morphology

abnormal palate

morphology

growth deficiency

Malformation of the heart and great vessels

abnormal heart and

great artery attachment

duplex kidney MP:0004017

common(UPheno)