1 part iii.the obo foundry project: towards scientific standards and principles-based coordination...

Post on 24-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Part III. The OBO Foundry Project:

Towards Scientific Standards and Principles-Based Coordination in Biomedical Ontology Development

2

High quality shared ontologies build communities

NIH, FDA trend to consolidate ontology-based standards for the communication and processing of biomedical data.

caBIG / NECTAR / BIRN / BRIDG ...

3

http://obo.sourceforge.net

4

http://www.geneontology.org/

5

6

7

8

The Methodology of Annotations

GO employs scientific curators, who use experimental observations reported in the biomedical literature to link gene products with GO terms in annotations.

This gene product exercises this function, in this part of the cell, leading to these biological processes

9

The Methodology of Annotations

This process of annotating literature leads to improvements and extensions of the ontology, which in turn leads to better annotations

This institutes a virtuous cycle of improvement in the quality and reach of both future annotations and the ontology itself.

Annotations + ontology taken together yield a slowly growing computer-interpretable map of biological reality.

RECALL: Alignment of GO and Cell ontologies will permit the generation of consistent and complete

definitions

id: CL:0000062name: osteoblastdef: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." [MESH:A.11.329.629]is_a: CL:0000055relationship: develops_from CL:0000008relationship: develops_from CL:0000375

GO

Cell type

New Definition

+

=Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix.

11

The OBO The OBO FoundryFoundry

12

A subset of OBO ontologies, whose developers have agreed in advance to accept a common set of principles designed to ensure

intelligibility to biologists (curators, annotators, users)

formal robustness

stability

compatibility

interoperability

support for logic-based reasoning

The OBO FoundryThe OBO Foundry

13

Custodians

• Michael Ashburner (Cambridge)• Suzanna Lewis (Berkeley)• Barry Smith (Buffalo/Saarbrücken)

The OBO FoundryThe OBO Foundry

14

A collaborative experiment

participants have agreed in advance to a growing set of principles specifying best practices in ontology development

designed to guarantee interoperability of ontologies from the very start

The OBO FoundryThe OBO Foundry

15

The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single reference ontology.

The OBO FoundryThe OBO Foundry

16

Initial Candidate Members of the OBO Foundry

GO Gene Ontology

CL Cell Ontology

SO Sequence Ontology

ChEBI Chemical Ontology

PATO Phenotype (Quality) Ontology

FuGO Functional Genomics Investigation Ontology

FMA Foundational Model of Anatomy

RO Relation Ontology 

The OBO FoundryThe OBO Foundry

17

Under development Disease Ontology

Mammalian Phenotype Ontology

OBO-UBO / Ontology of Biomedical Reality

Organism (Species) Ontology

Plant Trait Ontology

Protein Ontology

RnaO RNA Ontology

NCI Thesaurus ????

The OBO FoundryThe OBO Foundry

18

Considered for development

Environment OntologyBehavior OntologyBiomedical Image OntologyClinical Trial Ontology

The OBO FoundryThe OBO Foundry

19

CRITERIA

The OBO FoundryThe OBO FoundryThe OBO FoundryThe OBO Foundry

The ontology is open and available to be used by all.

The developers of the ontology agree in advance to collaborate with developers of other OBO Foundry ontology where domains overlap.

The ontology is in, or can be instantiated in, a common formal language.

20

The ontology possesses a unique identifier space within OBO.

The ontology provider has procedures for identifying distinct successive versions.

The ontology includes textual definitions for all terms.

CRITERIA

The OBO FoundryThe OBO Foundry

21

The ontology has a clearly specified and clearly delineated content.

The ontology is well-documented.

The ontology has a plurality of independent users.

CRITERIA

The OBO FoundryThe OBO Foundry

22

The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology.*

*Genome Biology 2005, 6:R46

CRITERIA

The OBO FoundryThe OBO Foundry

23

CRITERIA

Further criteria will be added over time in order to bring about a gradual improvement in the quality of the ontologies in the Foundry

The OBO FoundryThe OBO FoundryThe OBO FoundryThe OBO Foundry

24

Goal

Alignment of OBO Foundry ontologies through a common system of formally defined relations

to enable reasoning both within and across ontologies

The OBO FoundryThe OBO Foundry

25

A reference ontology

is analogous to a scientific theory; it seeks to optimize representational adequacy to its subject matter to the maximal degree that is compatible with the constraints of computational usefulness.

The OBO FoundryThe OBO Foundry

26

An application ontology

is comparable to an engineering artifact such as a software tool. It is constructed for a specific practical purpose.

Examples:

National Cancer Institute Thesaurus

FuGO Functional Genomics Investigation Ontology

The OBO FoundryThe OBO Foundry

27

Reference Ontology vs. Application Ontology

Currently, application ontologies are often built afresh for each new task; commonly introducing not only idiosyncrasies of format or logic, but also simplifications or distortions of their subject-matters.

To solve this problem application ontology development should take place always against the background of a formally robust reference ontology framework

The OBO FoundryThe OBO Foundry

28

Reference Ontologies promote re-usability of data

if dataschemas are formulated using terms drawn from a reference ontology used by others, then the data will be to this degree more accessible to others

The OBO FoundryThe OBO Foundry

29

Advantages of the methodology of shared coherently defined ontologies• promotes quality assurance (better coding)• guarantees automatic reasoning across

ontologies and across data at different granularities

• makes links between ontologies explicit• yields direct connection to temporally indexed

instance data

The OBO FoundryThe OBO Foundry

30

Advantages of the methodology of shared coherently defined ontologies

We know that high-quality ontologies can help in creating better mappings e.g. between human and model organism phenotypes

S Zhang, O Bodenreider, “Alignment of Multiple Ontologies of Anatomy: Deriving Indirect Mappings from Direct Mappings to a Reference Ontology”, AMIA 2005

The OBO FoundryThe OBO Foundry

31

Reference Ontologies

are already being used to create technology to aid literature search

http://www.gopubmed.org/

The OBO FoundryThe OBO Foundry

32

Goal:

to create a family of gold standard reference ontologies upon which terminologies developed for specific applications can draw

The OBO FoundryThe OBO Foundry

33

Goal:

to introduce the scientific method into ontology development:– all Foundry ontologies must be constantly updated

in light of scientific advance– all Foundry ontology developers must work with all

other Foundry ontology developers in a spirit of scientific collaboration

The OBO FoundryThe OBO Foundry

34

Goal:

to replace the current policy of ad hoc creation of new database schemas by each clinical research group by providing reference ontologies in terms of which database schemas can be defined

The OBO FoundryThe OBO Foundry

35

Goal:

to introduce some of the features of scientific peer review into biomedical ontology development

The OBO FoundryThe OBO Foundry

36

Goal:

to create controlled vocabularies for use by clinical trial banks, clinical guidelines bodies, scientific journals, ...

The OBO FoundryThe OBO Foundry

37

Goal:

to create an evolving map-like representation of the entire domain of biological reality

The OBO FoundryThe OBO Foundry

GO’s three ontologies

molecular function

cellular component

biological process

cell (types)

molecular function

(GO)

species

molecular process

cellular anatom

y

anatomy(fly, fish,

human...)

cellularphysiology

organism-levelphysiology

ChEBI,Sequence,

RNA ...

cell (types)

molecular function

(GO)

species

molecular process

cellular anatom

y

anatomy(fly, fish,

human...)

cellularphysiology

organism-levelphysiology

ChEBI,Sequence,

RNA ...

granular levels

cell (types)

molecular function

(GO)

species

molecular process

cellular anatom

y

anatomy(fly, fish, human...)

cellularphysiology

organism-levelphysiology

ChEBI,Sequence,

RNA ...

normal(functionings)

pathophysiology(disease)

pathoanatomy(fly, fish, human ...)

pathological(malfunctionings)

cell (types)

molecular function

(GO)

species

molecular process

cellular anatom

y(GO)

anatomy(fly, fish, human...)

cellularphysiology

organism-levelphysiology

ChEBI,Sequence,

RNA ...

pathophysiology(disease)

pathoanatomy(fly, fish, human ...)

cell (types)

molecular function

(GO)

species

molecular process

cellular anatom

y

anatomy(fly, fish, human...)

cellularphysiology

organism-levelphysiology

ChEBI,Sequence,

RNA ...

pathophysiology(disease)

pathoanatomy(fly, fish, human ...)

phenotype

cell (types)

molecular function

(GO)

species

molecular process

cellular anatom

y

anatomy(fly, fish, human...)

cellularphysiology

organism-levelphysiology

ChEBI,Sequence,

RNA ...

pathophysiology(disease)

pathoanatomy(fly, fish, human ...)

phenotype

investigation(FuGO)

46

Judith Blake:

“The use of bio-ontologies … ensures consistency of data curation, supports extensive data integration, and enables robust exchange of information between heterogeneous informatics systems. ..

ontologies … formally define relationships between the concepts.”

47

"Gene Ontology: Tool for the Unification of Biology"

an ontology "comprises a set of well-defined terms with well-defined relationships"

(Ashburner et al., 2000, p. 27)

48

Low Hanging Fruit

Ontologies should include only those relational assertions which hold universally (= have the ALL-SOME form)

Often, order will matter here:

We can include

adult transformation_of child

but not

child transforms_into adult

49

The Gene Ontology

50

GO’s three ontologies

molecular functions

cellular components

biological processes

51

When a gene is identified

three types of questions need to be addressed:

1. Where is it located in the cell?

2. What functions does it have on the molecular level?

3. To what biological processes do these functions contribute?

52

Three granularities:

Cellular (for components)

Molecular (for functions)

Organ + organism (for processes)

53

GO has cells

but it does not include terms for molecules or organisms within any of its three ontologies

except e.g. GO:0018995 host

=Def. Any organism in which another organism spends part or all of its life cycle

54

Are the relations between functions and processes a matter of granularity?

Molecular activities are the ‘building blocks’ of biological processes ?

But they are not allowed to be represented in GO as parts of biological processes

55

GO’s three ontologies

molecular functions

cellular components

biological processes

56

What does “function” mean?

an entity has a biological function if and only if it is part of an organism and has a disposition to act reliably in such a way as to contribute to the organism’s survival

the function is this disposition

57

Improved version

an entity has a biological function if and only if it is part of an organism and has a disposition to act reliably in such a way as to contribute to the organism’s realization of the canonical life plan for an organism of that type

58

This canonical life plan might include

canonical embryological development

canonical growth

canonical reproduction

canonical aging

canonical death

59

The function of the heart is to pump blood

Not every activity (process) in an organism is the exercise of a function – there are

mal functionings

side-effects (heart beating)

accidents (external interference)

background stochastic activity

60

Kidney

61

Nephron

62

Functional Segments

63

Functions

64

FunctionsThis is a screwdriver

This is a good screwdriver

This is a broken screwdriver

This is a heart

This is a healthy heart

This is an unhealthy heart

65

Functions are associated with certain characteristic process shapes

Screwdriver: rotates and simultaneously moves forward simultaneously transferring torque from hand and arm to screw

Heart: performs a contracting movement inwards and an expanding movement outwards

66

Not functioning at allleads to death, modulo

internal factors:

plasticity

redundancy (2 kidneys)

criticality of the system involved

external factors:

prosthesis (dialysis machines, oxygen tent)

special environments

assistance from other organisms

67

What clinical medicine is for

to eliminate malfunctioning by fixing broken body parts

(or to prevent the appearance of malfunctioning by intervening e.g. at the molecular level)

68

Hypothesis: there are no ‘bad’ functionsIt is not the function of an oncogene to cause cancer

Oncogenes were in every case proto-oncogenes with functions of their own

They become oncogenes because of bad (non-prototypical) environments

69

Is there an exception for molecular functions?

Does this apply only to functions on biological levels of granularity

(= levels of granularity coarser than the molecule) ?

If pathology is the deviation from (normal) functioning, does it make sense to talk of a pathological molecule?

(Pathologically functioning molecule vs. pathologically structured molecule)

70

Is there an exception for molecular functions?

A molecular function is a propensity of a gene product instance to perform actions on the molecular level of granularity.

Hypothesis 1: these actions must be reliably such as to contribute to biological processes.

Hypothesis 2: these actions must be reliably such as to contribute to the organism’s realization of the canonical life plan for an organism of that type.

71

The Gene Ontology

is a canonical ontology – it represents only what is normal in the realm of molecular functioning

72

The GO is a canonical representation

“The Gene Ontology is a computational representation of the ways in which gene products normally function in the biological realm”

Nucl. Acids Res. 2006: 34.

73

The FMA is a canonical representation

It is a computational representation of types and relations between types deduced from the qualitative observations of the normal human body, which have been refined and sanctioned by successive generations of anatomists and presented in textbooks and atlases of structural anatomy.

74

The importance of pathways (successive causality)

Each stage in the history of a disease presupposes the earlier stages

Therefore need to reason across time, tracking the order of events in time, using relations such as derives_from, transformation_of ...

Need pathway ontologies on every level of granularity

75

The importance of granularity (simultaneous causality)

Networks are continuants

At any given time there are networks existing in the organism at different levels of granularity

Changes in one cause simultaneous changes in all the others

(Compare Boyle’s law: a rise in temperature causes a simultaneous increase in pressure)

76

The Granularity Gulf

most existing data-sources are of fixed, single granularity

many (all?) clinical phenomena cross granularities

Therefore need to reason across time, tracking the order of events in time

77

Good ontologies require:

consistent use of terms, supported by logically coherent (non-circular) definitions, in equivalent human-readable and computable formats

coherent shared treatment of relations to allow cascading inference both within and between ontologies

78

Three fundamental dichotomies

• continuants vs. occurrents

• dependent vs. independent

• types vs. instances

79

ONTOLOGIES AREREPRESENTATIONS OF

TYPES

aka kinds, universals, categories, species,

genera, ...

80

Continuants (aka endurants)have continuous existence in timepreserve their identity through changeexist in toto whenever they exist at all

Occurrents (aka processes)have temporal partsunfold themselves in successive phasesexist only in their phases

81

You are a continuant

Your life is an occurrent

You are 3-dimensional

Your life is 4-dimensional

82

Dependent entities

require independent continuants as their bearers

There is no run without a runner

There is no grin without a cat

83

Dependent vs. independent continuants

Independent continuants (organisms, cells, molecules, environments)

Dependent continuants (qualities, shapes, roles, propensities, functions)

84

All occurrents are dependent entities

They are dependent on those independent continuants which are their participants (agents, patients, media ...)

Top-Level Ontology

ContinuantOccurrent

(always dependent on one or more

independent continuants)

IndependentContinuant

DependentContinuant

= A representation of top-level types

Continuant Occurrent

IndependentContinuant

DependentContinuant

cell component

biological process

molecular function

Top-Level Ontology

Continuant Occurrent

IndependentContinuant

DependentContinuant

Functioning

Side-Effect, Stochastic Process, ...

Function

Top-Level Ontology

Continuant Occurrent

IndependentContinuant

DependentContinuant

Functioning Side-Effect, Stochastic Process, ...

Function

Top-Level Ontology

Continuant Occurrent

IndependentContinuant

DependentContinuant

Quality Function Spatial Region

Functioning Side-Effect, Stochastic Process, ...

instances (in space and time)

90

Smith B, Ceusters W, Kumar A, Rosse C. On Carcinomas and Other Pathological Entities, Comp Functional Genomics, Apr. 2006

91

everything here is an independent continuant

92

Functions, etc.

Some dependent continuants are realizable

expression of a gene

application of a therapy

course of a disease

execution of an algorithm

realization of a protocol

93

Functions vs Functionings

the function of your heart = to pump blood in your body

this function is realized in processes of pumping blood

not all functions are realized (consider the function of this sperm ...)

94

Concepts

Biomedical ontology integration will never be achieved through integration of meanings or concepts

The problem is precisely that different user communities use different concepts

Concepts are in your head and will change as your understanding changes

top related