1 what an ontology is for barry smith university at buffalo common anatomy reference ontology...
Post on 22-Dec-2015
217 views
TRANSCRIPT
1
What an Ontology is ForBarry SmithUniversity at Buffalohttp://ontology.buffalo.edu/smith
Common Anatomy Reference Ontology Workshop
2
how do we know what data we have ?
how do I know what data you have ?
how do we know what data we don’t have ?
how do we make different sorts of data combinable ?
we are accumulating huge amounts of data
3
4
where in the cell ?
what kind of process ?
we need semantic annotation of data
what kind of biological end ?
5
Semantic Web, Moby, wikis, UMLS, etc.
let a million flowers (weeds) bloom
and create integration via post hoc mappings
how create broad-coverage semantic annotation systems for biomedicine?
6
for science
develop high quality annotation resources in a collaborative, community effort
create an evolutionary path towards improvement
on the basis of common prospective standards
based on science
an alternative
7
for sciencescience works out from a validated core, and strives to isolate and resolve inconsistencies as it extends outwards
we need to create a validated core including ontologies corresponding to the basic biomedical sciences in this corelow hanging fruit
FMA
Pleural Cavity
Pleural Cavity
Interlobar recess
Interlobar recess
Mesothelium of Pleura
Mesothelium of Pleura
Pleura(Wall of Sac)
Pleura(Wall of Sac)
VisceralPleura
VisceralPleura
Pleural SacPleural Sac
Parietal Pleura
Parietal Pleura
Anatomical SpaceAnatomical Space
OrganCavityOrganCavity
Serous SacCavity
Serous SacCavity
AnatomicalStructure
AnatomicalStructure
OrganOrgan
Serous SacSerous Sac
MediastinalPleura
MediastinalPleura
TissueTissue
Organ PartOrgan Part
Organ Subdivision
Organ Subdivision
Organ Component
Organ Component
Organ CavitySubdivision
Organ CavitySubdivision
Serous SacCavity
Subdivision
Serous SacCavity
Subdivision
part
_of
is_a
Foundational Model of Anatomy
9
for sciencewhere do we find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development, histology in different model organisms?
but we need more
10
what makes GO so wildly successful ?
11
science base: trained experts curating peer-reviewed literature
create an evolving set of standardized descriptions used to annotate the entities represented in the major biochemical databases
and thereby to integrate these databases
The methodology of annotations
12
this leads to improvements and extensions of the ontology
which in turn leads to better annotations
which leads to further improvement in the quality and reach of both future annotations and the ontology itself
RESULT: a slowly growing computer-interpretable map of biological reality within which major databases are automatically integrated in semantically searchable form
13
Five bangs for your GO buckcross-species database integration
cross-granularity database integration
through links to the things which are of biomedical relevance
semantic searchability links people to software
human curated science base creates de facto gold standard (benchmark for comparison)
14
but now
need to create a de jure standard:
improve the quality of the GO
establish common rules governing best practices for creating ontologies and for using these in annotations
apply these rules to create a complete suite of orthogonal interoperable biomedical reference ontologies
15
a shared portal for (so far) 58 ontologies (low regimentation)
http://obo.sourceforge.net
First step (2003)First step (2003)
16
Second step (2004)Second step (2004)reform efforts initiated, e.g. linking GO to other
OBO ontologies to ensure orthogonality
id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375
GO
Cell type
New Definition
+
=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.
17
The OBO FoundryThe OBO Foundryhttp://obofoundry.org/http://obofoundry.org/
Third step (2006)Third step (2006)
18
19
a family of interoperable gold standard biomedical reference ontologies to serve the annotation of inter alia
scientific literature model organism databases clinical trial data
The OBO FoundryThe OBO Foundry
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
20
A prospective standarddesigned to guarantee interoperability of ontologies from the very start (contrast to: post hoc mapping)
12 initial candidate OBO ontologies – focused primarily on basic science domains
several being constructed ab initio by influential consortia who have the authority to impose their use on large parts of the relevant communities.
21
undergoing rigorous reform
new
GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence Ontology
CARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO Functional Genomics Investigation OntologyPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
22
new
GO Gene OntologyChEBI Chemical Ontology CL Cell OntologyFMA Foundational Model of AnatomyPaTO Phenotype Quality OntologySO Sequence Ontology
CARO Common Anatomy Reference Ontology CTO Clinical Trial OntologyFuGO Functional Genomics Investigation OntologyPrO Protein Ontology RnaO RNA Ontology RO Relation Ontology
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
to be absorbed in new Ontology of Biomedical Investigations (OBI)
Ontology Scope URL Custodians
Cell Ontology (CL)
cell types from prokaryotes to mammals
obo.sourceforge.net/cgi-
bin/detail.cgi?cell
Jonathan Bard, Michael Ashburner, Oliver Hofman
Chemical Entities of Bio-
logical Interest (ChEBI)
molecular entities ebi.ac.uk/chebiPaula Dematos,Rafael Alcantara
Common Anatomy Refer-
ence Ontology (CARO)
anatomical structures in human and model
organisms(under development)
Melissa Haendel, Terry Hayamizu, Cornelius
Rosse, David Sutherland ???
Foundational Model of Anatomy (FMA)
structure of the human body
fma.biostr.washington.
edu
JLV Mejino Jr.,Cornelius Rosse
Functional Genomics Investigation
Ontology (FuGO)
design, protocol, data instrumentation, and
analysisfugo.sf.net FuGO Working Group
Gene Ontology (GO)
cellular components, molecular functions, biological processes
www.geneontology.orgGene Ontology
Consortium
Phenotypic Quality Ontology
(PaTO)
qualities of anatomical structures
obo.sourceforge.net/cgi
-bin/ detail.cgi?attribute_and_value
Michael Ashburner, Suzanna
Lewis, Georgios Gkoutos
Protein Ontology (PrO)
protein types and modifications
(under development)Protein Ontology
Consortium
Relation Ontology (RO)
relationsobo.sf.net/
relationshipBarry Smith, Chris
Mungall
RNA Ontology(RnaO)
three-dimensional RNA structures
(under development) RNA Ontology Consortium
Sequence Ontology(SO)
properties and features of nucleic sequences
song.sf.net Karen Eilbeck
24
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy?)
Anatomical Entity
(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)CELL AND CELLULAR
COMPONENT
Cell(CL)
Cellular Compone
nt(FMA, GO)
Cellular Function
(GO)
MOLECULE Molecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)
25
all OBO Foundry developers have agreed to a common set of evolving principles reflecting best practice in ontology development designed to ensure
tight connection to the biomedical basic sciences
compatibility
interoperability, common relations
formal robustness
support for logic-based reasoning
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
26
The ontology is OPEN and available to be used by all.
The ontology is in, or can be instantiated in, a COMMON FORMAL LANGUAGE.
The developers of the ontology agree in advance to COLLABORATE with developers of other OBO Foundry ontology where domains overlap.
PRINCIPLES
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
27
PRINCIPLES UPDATE: The developers of each ontology
commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement.
ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary.
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
28
for science
if we annotate a database or body of literature with one high-quality biomedical ontology, we should be able to add annotations from a second such ontology without conflicts
science aims for consistency
because science aims for correctness
orthogonality of ontologies implies additivity of annotations
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
29
IDENTIFIERS: The ontology possesses a unique identifier space within OBO.
VERSIONING: The ontology provider has procedures for identifying distinct successive versions to ensure BACKWARDS COMPATIBITY with annotation resources already in common use
The ontology includes TEXTUAL DEFINITIONS and where possible equivalent formal definitions of its terms.
PRINCIPLES
30
CLEARLY BOUNDED: The ontology has a clearly specified and clearly delineated content.
DOCUMENTATION: The ontology is well-documented.
USERS: The ontology has a plurality of independent users.
PRINCIPLES
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
31
COMMON ARCHITECTURE: The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.*
* Smith et al., Genome Biology 2005, 6:R46
PRINCIPLES
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
32
Foundational is_apart_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
OBO Relation Ontology
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
33
Further principles will be added over time in light of lessons learned
The Foundry is not seeking to serve as a check on flexibility or creativity
IT WILL GET HARDER
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
BUT NOT EVERYONE NEEDS TO JOIN
34
CREDIT for high quality ontology development work
KUDOS for early adopters of high quality ontologies / terminologies e.g. in reporting clinical trial results
GOALS
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
35
to introduce some of the features of SCIENTIFIC PEER REVIEW into biomedical ontology development
to providing a FRAMEWORK OF RULES to counteract the current policy of ad hoc creation
if data-schemas are formulated using a single ontology system in widespread use this supports DATA REUSABILITY
GOALS
The OBO FoundryThe OBO Foundry http://obofoundry.org/http://obofoundry.org/
36
A dichotomy
universals (types, kinds, classes) vs.
instances (particulars, individuals)
37
A 515287 DC3300 Dust Collector Fan
B 521683 Gilmer Belt
C 521682 Motor Drive Belt
Catalog vs. inventory
38An ontology is a representation of universals
39
An ontology is a representation of universals
We learn about universals by looking at scientific texts – which describe what is general in reality
siamese
mammal
cat
organism
substanceuniversals
animal
instances
frogleaf class
41
rule of single inheritance
no diamonds:
C
is_a2
B
is_a1
A
42
problems with multiple inheritance B C
is_a1 is_a2
A
‘is_a’ no longer univocal
43
‘is_a’ is pressed into service to mean a variety of different things
shortfalls from single inheritance are often clues to incorrect entry of terms and relations
the resulting ambiguities make the rules for correct entry difficult to communicate to human curators
44
is_a overloading
serves as obstacle to integration with neighboring ontologies
The success of ontology alignment depends crucially on the degree to which basic ontological relations such as is_a and part_of can be relied on as having the same meanings in the different ontologies to be aligned.
45
What single inheritance costs
In some respects harder to build ontologies
harder to use ontologies to find terms
Solutions: normalization, GUIs
Recommendation: if building from scratch use single inheritance
46
What single inheritance brings
Coherent hierarchies
Modularity
Statistical representativeness
Jointly exhaustive pairwise disjoint classification
Coherent methodology for definitions
47
Aristotelian definitions
When A is_a B, the definition of ‘A’ has the form:
an A =def. a B which ...
a human being =def. an animal which is rational
Each definition reflects the position in the hierarchy to which a defined term belongs.
48
FMA Examples
Cell =def. an anatomical structure which consists of cytoplasm surrounded by a plasma membrane with or without a cell nucleus
Plasma membrane =def. a cell part that surrounds the cytoplasm
49
Canonical ontologies
50
The FMA is a canonical representationof types and relations between types deduced from the qualitative observations of the normal human body, which have been refined and sanctioned by successive generations of anatomists and presented in textbooks and atlases of structural anatomy.
51
The GO is a canonical representation
“The Gene Ontology is a computational representation of the ways in which gene products normally function in the biological realm”
Nucl. Acids Res. 2006: 34.
52
The Gene Ontology
is a canonical ontology – it represents only what is normal in the realm of molecular functioning
53
The core of the OBO Foundry consists of canonical ontologies
(pathoanatomy, pathophysiology will come later)
54
Three canonical ontologies
CARO
+ Ontology of Functions
+ Ontology of Developmental Processes (part of GO Biological Process ontology?)
55
A second fundamental dichotomy
• universals vs. instances
• continuants vs. occurrents
56
Continuants (aka endurants)
– have continuous existence in time
– preserve their identity through change
Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive phases
57
You are a continuant
Your life is an occurrent
You are 3-dimensional
Your life is 4-dimensional
58
A third fundamental dichotomy
• types vs. instances
• continuants vs. occurrents
• dependent vs. independent
59
Dependent entities
require independent continuants as their bearers
There is no grin without a cat
There is no quality without a bearer
There is no disease without an organism
60
All occurrents are dependent entities
They are dependent on those independent continuants which are their participants (agents, patients, media ...)
There is no run without a runner
61
Dependent vs. independent continuants
Independent continuants (organisms, cells, molecules, environments)
Dependent continuants (qualities, shapes, roles, propensities, functions)
62
Top-Level Ontology
ContinuantOccurrent
(always dependent on one or more
independent continuants)
IndependentContinuant
DependentContinuant
63
Continuant Occurrent
IndependentContinuant
DependentContinuant
cell component
biological process
molecular function
The GO Top-Level Ontology
64
Functions vs Functioningsthe function of your heart = to pump blood in your body
this function is realized in processes of pumping blood
not all functions are realized (consider the function of this sperm ...)
not all processes are functionings
65
OccurrentContinuant
IndependentContinuant
DependentContinuant(Function)
FunctioningIncidental by-product
Stochasticprocess
66
The OBO Relation Ontology
67
Part_of as a relation between universals
heart part_of human being ?
human heart part_of human being ?
human testis part_of human being ?
human being has_part human testis ?
68
two kinds of parthood
1. between instances:
Mary’s heart part_of Mary
this nucleus part_of this cell
2. between universals
human heart part_of human
cell nucleus part_of cell
69
Definition of part_of as a relation between universals
A part_of B =Def. all instances of A are instance-level parts of some instance of B
human testis part_of adult human being
but notadult human being has_part human testis
70
Continuants
– have continuous existence in time
– preserve their identity through change
Occurrents (aka processes)
– have temporal parts
– unfold themselves in successive phases
71
part_of (for processes)
A part_of B =def.
For all x, if x instance_of A then there is some y, y instance_of B and x part_of y
where ‘part_of’ is the instance-level part relation
EVERY A IS PART OF SOME B
72
part_of (for continuants)
A part_of B =def.
For all x, t if x instance_of A at t then there is some y, y instance_of B at t and x part_of y at t
where ‘part_of’ is the instance-level part relation
ALL-SOME STRUCTURE
73
part_of (for continuants)
A part_of B =def.
if an A exists at t then it is part_of some B at t
where ‘part_of’ is the instance-level part relation
74
has_part (for continuants)
A has_part B =def.
if an A exists at t then there is some B of which
it is a part at t
75
human testis part_of adult human being
but not
adult human being has_part human testis
76
is_a (for processes)
A is_a B =def
For all x, if x instance_of A then x instance_of B
cell division is_a biological process
77
is_a (for continuants)
A is_a B =def
For all x, t if x instance_of A at t then x instance_of B at t
abnormal cell is_a cell
adult human is_a human
but not: adult is_a child
78
A part_of B, B part_of C ...
The all-some structure of the definitions in the OBO-RO allows
cascading of inferences
(i) within ontologies
(ii) between ontologies
(iii) between ontologies and EHR repositories of instance-data
79
OBO Relation OntologyFoundational is_a
part_of
Spatial located_incontained_inadjacent_to
Temporal transformation_ofderives_frompreceded_by
Participation has_participanthas_agent
80
David SutherlandFor any structure x, I should be able to answer
the questions:1. What is x (what type of thing is it)?2. Where is x (what is it part of)?3. What subtypes of x are there?4. What parts does x have?
81
For any structure x, I should be able to answer the questions:
1. What type of thing is x? Say: A2. What types of things are As part of ?3. What types of things are As located in ?4. What subtypes of A’s are there?5. What parts do A’s have?
For continuants: located_in = either part_of or contained_in
82
DavidThe first 2 questions are important for navigating the
ontology The second 2 questions are crucial to grouping
curationsIf we are looking for phenotypes that effect hands,
we need to be able to deduce that a hand has fingers and so add finger phenotypes to our hand phenotype list.
I think that having 'has_part' relationships in the ontology is key to acheiving this.
FMA
Pleural Cavity
Pleural Cavity
Interlobar recess
Interlobar recess
Mesothelium of Pleura
Mesothelium of Pleura
Pleura(Wall of Sac)
Pleura(Wall of Sac)
VisceralPleura
VisceralPleura
Pleural SacPleural Sac
Parietal Pleura
Parietal Pleura
Anatomical SpaceAnatomical Space
OrganCavityOrganCavity
Serous SacCavity
Serous SacCavity
AnatomicalStructure
AnatomicalStructure
OrganOrgan
Serous SacSerous Sac
MediastinalPleura
MediastinalPleura
TissueTissue
Organ PartOrgan Part
Organ Subdivision
Organ Subdivision
Organ Component
Organ Component
Organ CavitySubdivision
Organ CavitySubdivision
Serous SacCavity
Subdivision
Serous SacCavity
Subdivision
part
_of
is_a
Foundational Model of Anatomy
84
human uterus part_of human being
but not
human body has_part human uterus
85
Temporal relations
86
c at t1
C
c at t
C1
time
same instance
transformation_of
87
transformation_of
A transformation_of B =Def.
Every instance of A was at some earlier time an instance of B
adult transformation_of child
heart transformation of heart-precursor
88
C
c at t c at t1
C1
embryological development
89
C
c at t
C1
c1 at t1
C'
c' at t
time
instances
zygote derives_fromovumsperm
derives_from
90
two continuants fuse to form a new continuant C
c at t
C1
c1 at t1
C'
c' at t fusion
91
one initial continuant is replaced by two successor continuants
C
c at t
C1
c1 at t1
C2
c1 at t1
fission
92
is a relation combining transformation with fusion and fission (extended from the binary cases) what we are seeking in order to capture development
via CARO?
should this relation be called ‘derives_from’ or ‘develops_from’
93
one continuant detaches itself from an initial continuant, which itself continues to exist
C
c at t c at t1
C1
c1 at t
budding
94
one continuant absorbs a second continuant while itself continuing to exist
C
c at t
c at t1
C'
c' at t capture
95
Principle of low hanging fruit
often one of two reciprocal relations (e.g. part_of and has_part) will hold universally
human testis part_of human body
but not
human body has_part human testis
96
Principle of low hanging fruit
nucleus adjacent_to cytoplasm
but not
cytoplasm adjacent_to nucleus
97
Principle of low hanging fruit
seminal vesicle adjacent_to urinary bladder
but not:
urinary bladder adjacent_to seminal vesicle
98
Top-Level Categories in the FMAanatomical
entity
non-physicalanatomical entity
physicalanatomical entity
anatomical relationship
body substance
material physical anatomical entity
anatomical structure
non-material physical anatomical entity
body space
boundaryanatomical
attribute
99
Fiat vs. bona fide boundaries
100
Layers of the body’s surface
kidshealth.org/kid/ body/skin_noSW.html
101
Top-Level Categories in the FMAanatomical
entity
non-physicalanatomical entity
physicalanatomical entity
anatomical relationship
body substance
material physical anatomical entity
anatomical structure
non-material physical anatomical entity
body space
boundaryanatomical
attribute
102www.enel.ucalgary.ca/ People/Mintchev/stomach.htm
103
anatomicalentity
non-physicalanatomical entity
physicalanatomical entity
anatomical relationship
body substance
material physical anatomical entity
anatomical structure
non-material physical anatomical entity
body space
boundaryanatomical
attribute
fiat boundarybona fide boundary
104
fiat vs. bona fide boundaries
fiat boundary in anatomical space
physical boundary
105www.enel.ucalgary.ca/ People/Mintchev/stomach.htm
106
varieties of fiat boundaries
in anatomical structuresin body spaces
spatial vs. temporal (stages, pathways)
in instancesin the realm of universals
107
varieties of fiat boundaries
in anatomical structures
108
modes of connection
–attached_to (muscle to bone) –synapsed_with (nerve to nerve, nerve
to muscle)–continuous_with (= share a fiat
boundary)
109
a continuous_with b= a and b are continuant instances
which share a fiat boundaryThis relation on the instance level is always symmetric:
if x continuous_with y , then y continuous_with x
110
continuous_with(relation between universals)
A continuous_with B =Def.
for all x, if x instance-of A then there is some y such that y instance_of B and x continuous_with y
111
continuous_with as a relation between universals is not symmetric
Consider lymph node and lymphatic vessel:
– Each lymph node is continuous with some lymphatic vessel, but there are lymphatic vessels (e.g. lymphs and lymphatic trunks) which are not continuous with any lymph nodes
112
wherever we have fiat boundaries
there is a certain indeterminacy in the location of the boundary
where does the arm begin?
where does the head begin?
where does abnormal curvature of the spine begin
113
do regions have this indeterminacy?
114
An ontology is a representation of types
Each term in an ontology should be a singular common noun
Cell, lung, ...
refer to instances in reality by referring to the types which they instantiate
115
Problems with mass nouns
‘blood’
‘menstrual fluid’
116
Problems with ‘tissue’a specific portion of cells (instance)a specific portion of cells (type)a specific portion of cells of a certain type (instance)a specific portion of cells of a certain type (type)
an arbitrary portion of cells x 4 as above
all of the above IN the bodyall of the above in the form of samples OUTSIDE the bodya type of tissue, e.g. mesothelial tissue
117
Brenda Tissue Ontology contains statements like: arm is-a limb (here everything a tissue)
Aukland Anatomy Ontology Classifies tissue into: Connective tissue, Epithelial tissue, Glandular tissue, Muscle tissue, Nervous tissue;
proceeding further down the hierarchy we find not tissues but SimpleTubularGland, SimpleAcinarGland, etc.
EndocrineGland is asserted to have two ‘instances’ EndocrineGland (!), and FollicularEndocrineGland.
ConnectiveTissue has ‘instances’: Left Humerus, Right Tibia, ...
118
Recommendation
avoid ‘tissue’ and all count nouns
hypothesis: in every case where one would want to use ‘portion of tissue’ in a scientific anatomy we mean:
maximally connected portion of tissue, and there is already a common noun for a corresponding type (?)
119
120
- CMapplication (current and future) of Foundry principles in GOstagesapplication aspects of multiple inheritance: pre- and post-coordination