making phenotypic data fair++ for disease diagnosis and ......nov 29, 2016  · a simple data model...

18
Making Phenotypic data FAIR++ for Disease Diagnosis and Discovery Findable Accessible outside paywalls and private data sources Attributable Interoperable and Computable, Reusable, exchangeable across contexts and disciplines Melissa Haendel, PhD @ontowonka

Upload: others

Post on 25-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Making Phenotypic data FAIR++ for

    Disease Diagnosis and Discovery

    Findable Accessible outside paywalls and private data sources Attributable Interoperable and Computable, Reusable, exchangeable across contexts and disciplines

    Melissa Haendel, PhD @ontowonka

  • Biology central dogma

    Genes Environment Phenotypes + =

    Standards for encoding and exchanging data must be up to these challenges

    @ontowonka

  • Computable encodings are essential

    Genes Environment Phenotypes + =

    Base pairs Medical procedure coding Human Phenotype Variant notation (eg. HGVS) Environment Ontology Ontology

    Mammalian Phenotype Ontology

    @ontowonka

  • Standard exchange formats exist for genes … but for phenotypes? Environment?

    Genes Environment Phenotypes

    VCF PXFGFF BED

    @ontowonka

  • - - -

    Ontologies provide pre-packaged phenotype descriptions

    Köhler, S., Doelken, S. C., Mungall, C. J., … Robinson, P. N. (2013). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. doi:10.1093/nar/gkt1026

    Smith, C. L., & Eppig, J. T. (2015). Expanding the mammalian phenotype ontology to support automated exchange of high throughput mouse phenotyping data generated by large-scale mouse knockout screens, J Biomed Semantics. 2015; 6: 11 doi:10.1186/s13326 015 0009 1

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4378007/

  • A simple data model Entities

    – Organism • Patient • Non-human animal • Population

    – Genetic/genomic element – Condition

    • Disease • Phenotype

    Associations – E.g. between disease and phenotype – Each association has

    • Evidence • Provenance

    Entity

    Evidence

    Condition

    association

    Disease Phenotype

  • Phenopackets for clinical labs Patient medical history

    Patient and family history

    Diagnostic tests,

    clinical phenotypes

    Genomic information

    Physical exam

    Clinical testing lab

    Clinical labs often get no phenotypes or one-line descriptions. What if we could make the phenotype data PHI-free and

    simultaneously more descriptive?

  • Phenopackets for journals

    Each phenopacket can be shared via

    DOI in any repository outside paywall (eg. Figshare, Zenodo,

    Each article can be associated with a

    phenopacket

    etc) and cited as a Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372 data citation

  • Phenopackets for biomedical

    databases

    OMIA

    Databases could share G2P data in a standardized format, retaining domain or species specificity

  • Phenopackets for laypersons

    • Dry eyes • Developmental delay • Elevated liver function

    phenotype_profile: - entity: ”patient16"

    phenotype: types: - id: "HP:0000522"

    label: ”Alacrima" onset:

    description: ”at birth" types:

    - id: "HP:0003577" label: "Congenital onset"

    evidence: - types:

    - id: "ECO:0000033" label: ”Traceable Author Statement"

    source: - id: ”

    • Disease registries • Patient communities • Social media Image credits: ngly1.org https://twitter.com/examplepatient/status/1

    23456789"

    http:ngly1.orghttps://twitter.com/examplepatient/status/1

  • PhenoPacket formats

    CSV JSON RDF OWL

    Export phenopacket to

  • Simple Example (patient profile) phenotype_profile: - entity: “#1”

    phenotype: types: - id: HP:0100024

    label: conspicuously happy disposition - entity: “#1”

    phenotype: types: - id: MP:0001284

    label: absent vibrissae - entity: “#2”

    phenotype: types:

    - id: HP:0100024 label: conspicuously happy disposition

    header

    entities

    assocs

    persons: - id: „#1“

    label: Mickey Mouse date_of_birth: 1928-01-01 sex: M

    - id: „#2“ label: Goofy sex: M

    patients.pxf

  • Nesting allows refinement phenotype_profile: - entity: “#1”

    phenotype: types: - id: HP:0100024

    label: conspicuously happy disposition onset: types:

    - id: HP:0011463 label: Childhood onset

    description: “welcomes strangers with open arms” - entity: “#1”

    phenotype: types: - id: MP:0001284

    label: absent vibrissae - entity: “#2”

    phenotype: types:

    - id: HP:0100024 label: conspicuously happy disposition

    header

    entities

    assocs

    persons: - id: „#1“

    label: Mickey Mouse date_of_birth: 1928-01-01 sex: M

    - id: „#2“ label: Goofy sex: M

    patients.pxf

  • Simple Example (variants) variants.pxf

    phenotype_profile: - entity: “var#1”

    phenotype: types: - id: HP:0001595

    label: Abnormality of the hair onset: types:

    - id: HP:0011463 label: Childhood onset

    description: “missing whiskers” …

    header

    entities

    assocs

    variants: - id: „var#1“

    label: "c.2441+7A>G” descrHGVS: “c.2441+7A>G" startPosition: 0 endPosition: 0 …

    Insert GA4GH module here

    https://monarchinitiative.org/variant/ClinVarVariant:195890https://monarchinitiative.org/variant/ClinVarVariant:195890

  • Semantics with JSON-LD

    { "@context" : { "id": "@id", "label": "rdfs:label", "types": { "@id": "rdf:type", "@type": "@id"

    }, "negated_types": { "@id": "owl:complementOf", "@type": "@id"

    }, "title": "dc:title",

    "dc": "http://purl.org/dc/terms/", "MP" : "http://purl.obolibrary.org/obo/MP_",

    Provides a direct mapping to RDF

    Allows reasoning to be performed on phenopackets

    Provides a prefix map to unambiguously interpret CURIE-style identifiers (e.g. as recorded in PrefixCommons)

    http://purl.obolibrary.org/obo/MPhttp://purl.org/dc/terms

  • Summary: Phenotype Exchange Format • One model, derive alternate concrete forms

    – YAML, JSON, RDF, TSV (subset) • Species-agnostic

    – From microbes through plants through humans – clinical and basic research

    • Applicable to a variety of entities – Patients/individual organisms, cohorts, populations – Diseases – Papers – Genes, genotypes, alleles, variants

    • Simple for simple cases… – Bag of terms model

    • …Incremental expressivity – Temporality and causality – Quantitative as well as qualitative – Negation, severity, frequency, penetrance,

    expressivity • Ontology-smart

    – Rational Composition (post-coordination) – Explicit semantics

    http://phenopackets.org

    http://phenopackets.org/

  • Acknowledgments • Chris Mungall

    (schema/architecture) • Jules Jacobsen (java API) • James Balhoff (pxftools) • Jeremy Nguyen-Xuan (pxftools) • Seth Carbon (web phenote) • Kent Shefcheck (python API) • Matt Brush (modeling) • Dan Keith (web phenote) • Satwik Bhattamishra (GSOC

    student, PhenoPacketScraper)

    • Julie McMurry • Peter Robinson • Pier Buttigieg • Ramona Walls • Damian Smedley • Sebastian Kohler • Tudor Groza • Harry Hochheiser • Mark Diekhans • Melanie Courtot • Michael Baudis • Helen Parkinson • Suzanna Lewis

    Monarch Initiative NIH R24 OD011883

  • Phenopacket Tool ecosystem

    • Non JVM language bindings – Python (beta)

    • https://github.com/phenopackets/phenopacket-python/ – Javascript (alpha)

    • https://github.com/phenopackets/phenopacket-js/ • Pxftools

    – command line library, Scala utilities – https://github.com/phenopackets/pxftools

    • PhenoPacketScraper – GSOC project to make phenopackets from case study articles – https://github.com/monarch-initiative/phenopacket-scraper-core

    • OwlSim – Like blast, for phenotypes – https://github.com/monarch-initiative/owlsim-v3

    • WebPhenote – Noctua extension for phenopacket creation – http://create.monarchinitiative.org

    https://github.com/phenopackets/phenopacket-python/https://github.com/phenopackets/phenopacket-js/https://github.com/phenopackets/pxftoolshttps://github.com/monarch-initiative/phenopacket-scraper-corehttps://github.com/monarch-initiative/owlsim-v3http://create.monarchinitiative.org/

    Making Phenotypic data FAIR++ for Disease Diagnosis and DiscoveryBiology central dogmaComputable encodings are essentialStandard exchange formats exist for genes … �but for phenotypes? Environment?�Slide Number 5A simple data modelPhenopackets for clinical labsPhenopackets for journalsPhenopackets for biomedical databasesPhenopackets for laypersonsPhenoPacket formatsSimple Example (patient profile)Nesting allows refinementSimple Example (variants)Semantics with JSON-LDSummary: Phenotype Exchange FormatAcknowledgmentsPhenopacket Tool ecosystem