what's in a genotype?: an ontological characterization for the integration of genetic variation...

17
An Ontological Characterization for the Integration of Genetic Variation Data WHAT’S IN A GENOTYPE? Matthew H. Brush, hris Mungall, Nicole Washington, and Melissa Haende regon Health and Science University, Lawrence Berkeley Lab International Conference in Biomedical Ontology July 8, 2013

Upload: mhb120

Post on 22-May-2015

304 views

Category:

Education


1 download

DESCRIPTION

ICBO 2013 Presentation

TRANSCRIPT

Page 1: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

An Ontological Characterization for theIntegration of Genetic Variation Data

WHAT’S IN A GENOTYPE?

Matthew H. Brush, Chris Mungall, Nicole Washington, and Melissa HaendelOregon Health and Science University, Lawrence Berkeley Labs

International Conference in Biomedical OntologyJuly 8, 2013

Page 2: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Genotype-to-Phenotype Research

B6.Cg-Alms1foz/fox/J

increased weight,adipose tissue volume,

glucose homeostasis altered

ALSM1(NM_015120.4)[c.10775delC] + [-]

GENOTYPE

PHENOTYPE

obesity,diabetes mellitus, insulin resistance

increased food intake, hyperglycemia,

insulin resistance

kcnj11c14/c14; insrt143/+(AB)

G2P research seeks a mechanistic understanding of how genetic variation is linked to organismal biology and disease

Page 3: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Integrating G2P Data

Page 4: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Integrating G2P Data

The Monarch InitiativeThe Monarch Initiative aims to bring G2P and related data together under a common semantic framework to support

integrated exploration and analysis.

Page 5: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Integration Challenges

I. Reconciling G2P data annotated to different ‘levels’ of a genotype II. Integrating ‘non-genomic’ forms of variationIII. Creating semantic links to biological data

Technical Challenges Terminological, syntactic, organizational variation in data is common

Knowledge-Based Challenges Reflect inherent complexity in the way G2P data is

generated and what it represents

Page 6: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

GCGAAGTGCCAACTTCTACACACACAAAG

GCGAAGTGCCAACTTCTACACACACAAAG

Decomposition of a Genotype

genotype genomic variationcomplementgenomic background

= + CGTAGC

CGTACC

apchu745/+; fgf8ati282/ti282(AB)

genomic variationcomplement

variant single locuscomplement

variant locus(allele)

sequence alteration

has_part has_part

apchu745/+

apchu745

hu745

has_part has_part

has_part has_part

XAACGTACCGACGCTCGCTACGGGCGTATC

(AB) apchu745/+; fgf8ati282/ti282

apchu745/+; fgf8ati282/ti282

GCGAAGTGCCAACTTCTACACACACAAAG

GCGAAGTGCCAACTTCTACACACACAAAG

AACGTAGCGACGCTCGCTACGGGCGTATC

AACGTACCGACGCTCGCTACGGGCGTATC X

ACAC

X

X

X

X

Genotype – an information entity that specifies an entire genome sequence in terms of its variation from some reference genome

AACGTAGCGACGCTCGCTACGGGCGTATC

X ACAC

X

X

X

XX

Page 7: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

I. Reconciling Levels of G2P Association

apchu745/+; fgf8ati282/ti282(AB)

increased cell proliferationdisrupted digestive tract development

gut deformation

APC (NM_000038.5)c.937_938delGA

X

Phenotype AllelePhenotype Genome CGTACCG

GCGAAGTGCCAACTTCTACACACACAAAG

GCGAAGTGCCAACTTCTACACACACAAAG

XAACGTACCGACGCTCGCTACGGGCGTATC

AACGTAGCGACGCTCGCTACGGGCGTATC

XX

intestinal polypsabnormal retinal pigmentation

sebaceous cysts

Page 8: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

allele: apchu745

gene: apc fgf8a allele: c.937_938delGAgene: apc

(PHENOTYPEPROPAGATION)

I. Reconciling Levels of G2P Association

inferred

apchu745/+; fgf8ati282/ti282(AB)

increased cell proliferationdisrupted digestive tract development

gut deformation

APC (NM_000038.5)c.937_938delGA

X

Phenotype Genome CGTACCG

GCGAAGTGCCAACTTCTACACACACAAAG

GCGAAGTGCCAACTTCTACACACACAAAG

XAACGTACCGACGCTCGCTACGGGCGTATC

AACGTAGCGACGCTCGCTACGGGCGTATC

XX

intestinal polypsabnormal retinal pigmentation

sebaceous cysts

Phenotype Allele

Page 9: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Property chains exploit the transitive genotype partonomy to infer phenotype associations

[variant] is_variant_part_of genotype

genotype has_phenotype phenotype

Atomic Relations

Composed Relation

is_variant_part_of o has_phenotype -->

is_variant_with_phenotype

Implementation of Phenotype Propagation

Page 10: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Example of Phenotype Propagation has_phenotype

apchu745/+;fgf8ati282/ti282(AB)cell proliferation,

digestive tract developmentgut deformation

1. Monarch ingests phenotypes annotated to a genotype

genotype

Page 11: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Example of Phenotype Propagation

apchu745,fgf8ati282

hu745ti282

has_variant_part

has_variant_part

has_variant_part

has_variant_part

apchu745/+;fgf8ati282/ti282(AB)

apchu745/+;fgf8ati282/ti282

apchu745/+ ,fgf8ati282/ti282

cell proliferation,digestive tract development

gut deformation

apc fgf8a

1. Monarch ingests phenotypes annotated to a genotype

2. Genotype is parsed to create instances down partonomy Alleles

GVC

VSLCs

Seq.Alts

Genes

has_phenotype

Page 12: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Example of Phenotype Propagation

1. Monarch ingests phenotypes annotated to a genotype

2. Genotype is parsed to create instances down partonomy

3. Phenotype propagation infers associations between phenotypes and each level in the partonomy

apchu745,fgf8ati282

hu745ti282

apc fgf8a

has_variant_part

has_variant_part

has_variant_part

has_variant_part

apchu745/+;fgf8ati282/ti282(AB)

apchu745/+;fgf8ati282/ti282

apchu745/+ ,fgf8ati282/ti282

cell proliferation,digestive tract development

gut deformation

Alleles

GVC

VSLCs

Seq.Alts

Genes

has_phenotype

is_variant_with_

phenotype

Page 13: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

II. Integrating Non-Genomic Variation‘Extrinsic genotypes’ describe sequences subject to transient variations in expression at the

time of an experiment

Representing extrinsic variation data in terms of the

targeted genes facilitates integration with ‘intrinsic’ G2P

data

Morpholino-mediated gene knockdown

;

Page 14: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

III. Semantic Links to Related Data

Page 15: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

GENO In the OBO Foundry• GENO modeled according to OBO Foundry principles, under

conceptual frameworks of the BFO, IAO, and SO

• Collaborators in SO refactoring to enhance genetic variation representation, and ensure integration of Monarch data with SO-annotated genomes

Page 16: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Summary and Future DirectionsGENO in the Monarch Data Integration Pipeline

1. Raw data ingested into Monarch RDB2. Views generated that contain “GENO-enhanced” data

(standardized syntax, unpacked genotypes, links to external data) 3. D2RQ maps relational data to GENO and generates RDF4. GENO-supported reasoning adds inferred G2P associations (e.g.

phenotype propagation)

Future Directions1. Modeling of transgenes, human variation, and related data types2. Develop property chains and algorithms to improve specificity and

weighting of inferred G2P associations3. Separate application features to provide a community model for

public release and integration with SO

Page 17: What's In a Genotype?: An Ontological Characterization for the Integration of Genetic Variation Data

Acknowledgements

OHSUMelissa Haendel

Carlo TorniaiShahim Essaid

Nicole VasilevskyScott Hoffman

LBNLChris Mungall

Suzi LewisNicole Washington

UCSD/NIFMaryann MartoneAnita Bandrowski

Jeff GretheAmarnath Gupta

Trish Whetzel

University of PittsburghHarry HochheiserChuck Borromeo

Monarch Initiative / NIF

Sequence OntologyUniversity of Utah

Karen EilbeckUniversity of Colorado

Mike Bada

Funding NIH # 1R24OD011883-01

We are under construction

OHSU OntologyDevelopment Group

www.ohsu.edu/library/ontologyGENO ontology

purl.obolibrary.org/obo/geno.owl