phenotype rcn so-geno_workshop(shared)
TRANSCRIPT
Toward the Integration of Genetic Variation ModelingSO-Monarch Workshop
Sept 16-18, 2013Oregon Health and Science University
Portland, ORFunded by the Phenotype RCN
Matthew BrushMelissa Haendel
Chris MungallMike Bada
Karen EilbeckBret Heale
MONARCH INITIATIVE
A rough transcript of audio from this presentation can be found here:https://
docs.google.com/document/d/13oifUZeWxK5hXPlMW6pl3B-Xoxr6xTossUT4_Fl2cgg/edit
The Sequence Ontology
An OBO Library ontology developed to standardize the vocabulary and semantics of biological sequence annotation
Use has expanded from genome annotation into new applications - variation description, text mining, experimental data annotation
Undergoing significant refactoring to meet new needs:- align with BFO- enhance variation representation- develop parallel representation of physical sequence entities
- SO vs MSO (molecular sequence ontology)- improve explicit representations across central dogma
Monarch Initiative
The Monarch InitiativeThe Monarch Initiative aims to bring G2P and related data together under a
common semantic framework and develop tools and services for user-guided exploration and analysis
Data integration and application functionality driven by a suite of ontologies that include many community resources as well as new ontologies (GENO)
environment
The core use case of GENO for Monarch is to support aggregation and semantic integration of genotype data and its link to phenotypes across these diverse sources.
GENO is an ontology of 'genotype' sequence information that describes types and scales of genetic variations associated with phenotypes, and places these variations in a broader biological context.
Genotype information in GENO is viewed broadly to include any variation in gene expression that is tied to an observed phenotypic effect. We distinguish two types of variation:
GENO: A Genotype Ontology
SEQUENCE-VARIATION AGACTACTACGTAGGTCCTCC
Arg-Leu-Leu-Arg-Stop
PHENOTYPE‘short fin’
environment
GENO: A Genotype Ontology
AGACTACTACGTACGTCCTCC
Arg-Leu-Leu-Arg-Thr-Ser-Ser
EXPRESSION-VARIATION
PHENOTYPE‘short fin’
X
morpholinos
The core use case of GENO for Monarch is to support aggregation and semantic integration of genotype data and its link to phenotypes across these diverse sources.
GENO is an ontology of 'genotype' sequence information that describes types and scales of genetic variations associated with phenotypes, and places these variations in a broader biological context.
Genotype information in GENO is viewed broadly to include any variation in gene expression that is tied to an observed phenotypic effect. We distinguish two types of variation:
GENO: A Genotype Ontology
SEQUENCE-VARIATION
‘Intrinsic’ Genotypeapchu745/+; fgf8ti282/ti282(AB)
EXPRESSION-VARIATION
‘Extrinsic’ GenotypeshhbMO1-shhb(2ng); ihhbMO2-ihhb(1ng)
Together these artifacts can capture the complement of all genetic variation in an organism in terms of the loci that are altered in their sequence or their expression level
The core use case of GENO for Monarch is to support aggregation and semantic integration of genotype data and its link to phenotypes across these diverse sources.
GENO is an ontology of 'genotype' sequence information that describes types and scales of genetic variations associated with phenotypes, and places these variations in a broader biological context.
Genotype information in GENO is viewed broadly to include any variation in gene expression that is tied to an observed phenotypic effect. We distinguish two types of variation:
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
Decomposition of an ‘Intrinsic’ Genotype
genotype genomic variationcomplementgenomic background
= + CGTAGC
CGTACC
apchu745/+; fgf8ti282/ti282(AB)
genomic variationcomplement
variant single locuscomplement
variant locus(aka allele)
sequence alteration
has_part has_part
apchu745/+
apchu745
hu745
has_part has_part
has_part has_part
XAACGTACCGACGCTCGCTACGGGCGTATC
(AB) apchu745/+; fgf8ati282/ti282
apchu745/+; fgf8ati282/ti282
GCGAAGTGCCAACTTCTACACACACAAAG
GCGAAGTGCCAACTTCTACACACACAAAG
AACGTAGCGACGCTCGCTACGGGCGTATC
AACGTACCGACGCTCGCTACGGGCGTATC X
ACAC
X
X
X
X
Intrinsic Genotype – specifies a sequence variation across an entire genome in terms of its differences from some reference genome
AACGTAGCGACGCTCGCTACGGGCGTATC
X ACAC
X
X
X
XX
Decomposition of an ‘Extrinsic’ Genotype‘Extrinsic genotypes’ describe sequences subject to transient variation in expression at the
time of an experiment
An extrinsic genotype is comprised of the collection of all
genes in the organism that are variant in their expression as a
result of some experimental intervention
Morpholino-mediated gene knockdown
‘Effective’ Genotypes
Workshop Motivation Both GENO and SO are developed to cover different
perspectives on the domain of abstract biological sequence information.
GENO is new and developing. SO is mature but undergoing major refactoring.
Primary workshop goal was to ensure models are interoperable to allow integration of data described using SO and GENO.
Our work fell into three categories: o Ontologyo Communityo Logistics/Planning
Ontology1. Ontological Debate: to align high-level ontological modeling
of sequence features and intrinsic and extrinsic variation
2. Core terminological standardization: Establish clear definitions and usage of core domain terms (gene, allele, variant, etc) . . .in progress
3. SO vs MSO: Developed strategy for parallel SO and MSO development and maintenance . . in progress
4. Conceptual Integration of SO and GENO: Intrinsic genotype modeling in scope of SO, but extrinsic modeling is not and will live exclusively in GENO
Workshop Goals and Outcomes
Workshop Goals and Outcomes Community
1. Gene representation: strategy to provide an ontological representation and identifiers for genes and their variants as a community resource for diverse applications
2. Modeling the central dogma: build from gene representation to describe relations to sequences at RNA and protein levels, and properties that emerge here
3. Phenotype annotation practices: develop a standard for use of phenotype ontologies for GVF file annotation• http://www.sequenceontology.org/so_wiki/index.php/Using_Phenotype_Ontologies_in_GVF
4. GENO as a community ontology: plan for separating monarch specific features from a more generally useful community model
Workshop Goals and Outcomes Planning and Logistics
Technical Integration Plan: decide how GENO and SO will interact at technical level (namespaces, imports, mappings, etc) Collaborative Development Plan: establish framework of tools and practices for parallel development of SO and Monarch ontologies Data and Use Case Plan: to collect use cases and build test data sets from the community to inform and test our modeling
Continued Working Group : weekly Tuesday afternoon calls, open to community
Thank You
Thanks to the Phenotype RCN for their support!
Details about workshop outcomes can be found here:https://docs.google.com/document/d/1AUEVX0Sx_iy9mTI6F59Yo7ZCXu4zv5uSk28AHid5zhc/edit#
Questions?