making phenotypic data fair++ for disease diagnosis and ......nov 29, 2016 · a simple data model...
TRANSCRIPT
-
Making Phenotypic data FAIR++ for
Disease Diagnosis and Discovery
Findable Accessible outside paywalls and private data sources Attributable Interoperable and Computable, Reusable, exchangeable across contexts and disciplines
Melissa Haendel, PhD @ontowonka
-
Biology central dogma
Genes Environment Phenotypes + =
Standards for encoding and exchanging data must be up to these challenges
@ontowonka
-
Computable encodings are essential
Genes Environment Phenotypes + =
Base pairs Medical procedure coding Human Phenotype Variant notation (eg. HGVS) Environment Ontology Ontology
Mammalian Phenotype Ontology
@ontowonka
-
Standard exchange formats exist for genes … but for phenotypes? Environment?
Genes Environment Phenotypes
VCF PXFGFF BED
@ontowonka
-
- - -
Ontologies provide pre-packaged phenotype descriptions
Köhler, S., Doelken, S. C., Mungall, C. J., … Robinson, P. N. (2013). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. doi:10.1093/nar/gkt1026
Smith, C. L., & Eppig, J. T. (2015). Expanding the mammalian phenotype ontology to support automated exchange of high throughput mouse phenotyping data generated by large-scale mouse knockout screens, J Biomed Semantics. 2015; 6: 11 doi:10.1186/s13326 015 0009 1
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4378007/
-
A simple data model Entities
– Organism • Patient • Non-human animal • Population
– Genetic/genomic element – Condition
• Disease • Phenotype
Associations – E.g. between disease and phenotype – Each association has
• Evidence • Provenance
Entity
Evidence
Condition
association
Disease Phenotype
-
Phenopackets for clinical labs Patient medical history
Patient and family history
Diagnostic tests,
clinical phenotypes
Genomic information
Physical exam
Clinical testing lab
Clinical labs often get no phenotypes or one-line descriptions. What if we could make the phenotype data PHI-free and
simultaneously more descriptive?
-
Phenopackets for journals
Each phenopacket can be shared via
DOI in any repository outside paywall (eg. Figshare, Zenodo,
Each article can be associated with a
phenopacket
etc) and cited as a Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372 data citation
-
Phenopackets for biomedical
databases
OMIA
Databases could share G2P data in a standardized format, retaining domain or species specificity
-
Phenopackets for laypersons
• Dry eyes • Developmental delay • Elevated liver function
phenotype_profile: - entity: ”patient16"
phenotype: types: - id: "HP:0000522"
label: ”Alacrima" onset:
description: ”at birth" types:
- id: "HP:0003577" label: "Congenital onset"
evidence: - types:
- id: "ECO:0000033" label: ”Traceable Author Statement"
source: - id: ”
• Disease registries • Patient communities • Social media Image credits: ngly1.org https://twitter.com/examplepatient/status/1
23456789"
http:ngly1.orghttps://twitter.com/examplepatient/status/1
-
PhenoPacket formats
CSV JSON RDF OWL
Export phenopacket to
-
Simple Example (patient profile) phenotype_profile: - entity: “#1”
phenotype: types: - id: HP:0100024
label: conspicuously happy disposition - entity: “#1”
phenotype: types: - id: MP:0001284
label: absent vibrissae - entity: “#2”
phenotype: types:
- id: HP:0100024 label: conspicuously happy disposition
header
entities
assocs
persons: - id: „#1“
label: Mickey Mouse date_of_birth: 1928-01-01 sex: M
- id: „#2“ label: Goofy sex: M
patients.pxf
-
Nesting allows refinement phenotype_profile: - entity: “#1”
phenotype: types: - id: HP:0100024
label: conspicuously happy disposition onset: types:
- id: HP:0011463 label: Childhood onset
description: “welcomes strangers with open arms” - entity: “#1”
phenotype: types: - id: MP:0001284
label: absent vibrissae - entity: “#2”
phenotype: types:
- id: HP:0100024 label: conspicuously happy disposition
header
entities
assocs
persons: - id: „#1“
label: Mickey Mouse date_of_birth: 1928-01-01 sex: M
- id: „#2“ label: Goofy sex: M
patients.pxf
-
Simple Example (variants) variants.pxf
phenotype_profile: - entity: “var#1”
phenotype: types: - id: HP:0001595
label: Abnormality of the hair onset: types:
- id: HP:0011463 label: Childhood onset
description: “missing whiskers” …
header
entities
assocs
variants: - id: „var#1“
label: "c.2441+7A>G” descrHGVS: “c.2441+7A>G" startPosition: 0 endPosition: 0 …
Insert GA4GH module here
https://monarchinitiative.org/variant/ClinVarVariant:195890https://monarchinitiative.org/variant/ClinVarVariant:195890
-
Semantics with JSON-LD
{ "@context" : { "id": "@id", "label": "rdfs:label", "types": { "@id": "rdf:type", "@type": "@id"
}, "negated_types": { "@id": "owl:complementOf", "@type": "@id"
}, "title": "dc:title",
"dc": "http://purl.org/dc/terms/", "MP" : "http://purl.obolibrary.org/obo/MP_",
Provides a direct mapping to RDF
Allows reasoning to be performed on phenopackets
Provides a prefix map to unambiguously interpret CURIE-style identifiers (e.g. as recorded in PrefixCommons)
http://purl.obolibrary.org/obo/MPhttp://purl.org/dc/terms
-
Summary: Phenotype Exchange Format • One model, derive alternate concrete forms
– YAML, JSON, RDF, TSV (subset) • Species-agnostic
– From microbes through plants through humans – clinical and basic research
• Applicable to a variety of entities – Patients/individual organisms, cohorts, populations – Diseases – Papers – Genes, genotypes, alleles, variants
• Simple for simple cases… – Bag of terms model
• …Incremental expressivity – Temporality and causality – Quantitative as well as qualitative – Negation, severity, frequency, penetrance,
expressivity • Ontology-smart
– Rational Composition (post-coordination) – Explicit semantics
http://phenopackets.org
http://phenopackets.org/
-
Acknowledgments • Chris Mungall
(schema/architecture) • Jules Jacobsen (java API) • James Balhoff (pxftools) • Jeremy Nguyen-Xuan (pxftools) • Seth Carbon (web phenote) • Kent Shefcheck (python API) • Matt Brush (modeling) • Dan Keith (web phenote) • Satwik Bhattamishra (GSOC
student, PhenoPacketScraper)
• Julie McMurry • Peter Robinson • Pier Buttigieg • Ramona Walls • Damian Smedley • Sebastian Kohler • Tudor Groza • Harry Hochheiser • Mark Diekhans • Melanie Courtot • Michael Baudis • Helen Parkinson • Suzanna Lewis
Monarch Initiative NIH R24 OD011883
-
Phenopacket Tool ecosystem
• Non JVM language bindings – Python (beta)
• https://github.com/phenopackets/phenopacket-python/ – Javascript (alpha)
• https://github.com/phenopackets/phenopacket-js/ • Pxftools
– command line library, Scala utilities – https://github.com/phenopackets/pxftools
• PhenoPacketScraper – GSOC project to make phenopackets from case study articles – https://github.com/monarch-initiative/phenopacket-scraper-core
• OwlSim – Like blast, for phenotypes – https://github.com/monarch-initiative/owlsim-v3
• WebPhenote – Noctua extension for phenopacket creation – http://create.monarchinitiative.org
https://github.com/phenopackets/phenopacket-python/https://github.com/phenopackets/phenopacket-js/https://github.com/phenopackets/pxftoolshttps://github.com/monarch-initiative/phenopacket-scraper-corehttps://github.com/monarch-initiative/owlsim-v3http://create.monarchinitiative.org/
Making Phenotypic data FAIR++ for Disease Diagnosis and DiscoveryBiology central dogmaComputable encodings are essentialStandard exchange formats exist for genes … �but for phenotypes? Environment?�Slide Number 5A simple data modelPhenopackets for clinical labsPhenopackets for journalsPhenopackets for biomedical databasesPhenopackets for laypersonsPhenoPacket formatsSimple Example (patient profile)Nesting allows refinementSimple Example (variants)Semantics with JSON-LDSummary: Phenotype Exchange FormatAcknowledgmentsPhenopacket Tool ecosystem