how to build an ontology

95
How to Build an Ontology Barry Smith http://ontology.buffalo.edu/smith 1

Upload: jalen

Post on 14-Jan-2016

66 views

Category:

Documents


0 download

DESCRIPTION

How to Build an Ontology. Barry Smith http://ontology.buffalo.edu/smith. Everywhere databases are being created. too often in such a way that the data is siloed leading to massive expense in integrating data in ad hoc ways - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: How to Build an Ontology

How to Build an Ontology

Barry Smith

http://ontology.buffalo.edu/smith

1

Page 2: How to Build an Ontology

Everywhere databases are being created

too often in such a way that the data is siloed

leading to massive expense in integrating data in ad hoc ways

if the data could be collected on the basis of shared controlled vocabularies from the start, much of this massive expense could be avoided

2

Page 3: How to Build an Ontology

Uses of ‘ontology’ in PubMed abstracts

3

Page 4: How to Build an Ontology

By far the most successful: GO (Gene Ontology)

4

Page 5: How to Build an Ontology

Consequences of the Human Genome Project

we can match gene sequences very effectively, for example finding patterns shared between humans and mice

but we can make sense of these gene sequences only if we know

• where in the cell they occur • with what molecular functions they are associated• to what biological processes they contribute

5

Page 6: How to Build an Ontology

GO provides a controlled system of terms for use in annotating (describing, tagging) data

• multi-species, multi-disciplinary, open source

• contributing to the cumulativity of scientific results obtained by distinct research communities

• compare use of kilograms, meters, seconds in formulating experimental results

6

Page 7: How to Build an Ontology

Hierarchical view representing relations between represented types7

Page 8: How to Build an Ontology

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

8

Page 9: How to Build an Ontology

US $100 mill. invested in literature and data curation using GO

over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GOexperimental results reported in 52,000 scientific journal articles manually annoted by expert biologists using GO

9

Page 10: How to Build an Ontology

GO has learned the lessons of successful cooperation

• Clear documentation• The terms chosen are already familiar• Fully open source (allows thorough testing in

manifold combinations with other ontologies)• Subjected to constant third-party critique • Updated every night

10

Page 11: How to Build an Ontology

ontologies used to annotate databases

MouseEcotope GlyProt

DiabetInGene

GluChem

sphingolipid transporter

activity

11

Page 12: How to Build an Ontology

annotation using common ontologies yields integration of databases

MouseEcotope GlyProt

DiabetInGene

GluChem

Holliday junction helicase complex

12

Page 13: How to Build an Ontology

annotation using common ontologies can yield integration of image data

13

Page 14: How to Build an Ontology

annotation using common ontologies can support comparison of image data

14

Page 15: How to Build an Ontology

annotation with Gene Ontology

supports reusability of data

supports search of data by humans

supports reasoning with data by humans and machines

− but the method works only to the degree that many, many people use the GO to annotate their data

15

Page 16: How to Build an Ontology

GO has been amazingly successful in overcoming the data balkanization problem

but it covers only generic biological entities of three sorts:

– cellular components– molecular functions– biological processes

and it does not provide representations of diseases, symptoms, …

16

Page 17: How to Build an Ontology

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Original OBO Foundry ontologies (Gene Ontology in yellow) 17

Page 18: How to Build an Ontology

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

Environment Ontology

envi

ron

men

ts

are

her

e

18

Page 19: How to Build an Ontology

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

order 19

Page 20: How to Build an Ontology

Ontology success stories, and some reasons for failure

chaos 20

Page 21: How to Build an Ontology

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

COMPLEX OFORGANISMS

Family, Community, Deme, Population

OrganFunction

(FMP, CPRO)

Population Phenotype

PopulationProcess

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Componen

t(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

http://obofoundry.org 21

Page 22: How to Build an Ontology

Developers commit to working to ensure that, for each domain, there is community convergence on a single ontology

and agree in advance to collaborate with developers of ontologies in adjacent domains.

http://obofoundry.org

The OBO Foundry: a step-by-step, evidence-based approach to expand

the GO

22

Page 23: How to Build an Ontology

OBO Foundry Principles

Common governance (coordinating editors)

Common training

Common architecture

• simple shared top level ontology

• shared Relation Ontology: www.obofoundry.org/ro

23

Page 24: How to Build an Ontology

Pleural Cavity

Pleural Cavity

Interlobar recess

Interlobar recess

Mesothelium of Pleura

Mesothelium of Pleura

Pleura(Wall of Sac)

Pleura(Wall of Sac)

VisceralPleura

VisceralPleura

Pleural SacPleural Sac

Parietal Pleura

Parietal Pleura

Anatomical SpaceAnatomical Space

OrganCavityOrganCavity

Serous SacCavity

Serous SacCavity

AnatomicalStructure

AnatomicalStructure

OrganOrgan

Serous SacSerous Sac

MediastinalPleura

MediastinalPleura

TissueTissue

Organ PartOrgan Part

Organ Subdivision

Organ Subdivision

Organ Component

Organ Component

Organ CavitySubdivision

Organ CavitySubdivision

Serous SacCavity

Subdivision

Serous SacCavity

Subdivision

part

_of

is_a

24

Page 25: How to Build an Ontology

Open Biomedical Ontologies Foundry

Seeks to create high quality, validated terminology modules across all of the life sciences which will be

• one ontology for each domain, so no need for mappings

• close to language use of experts

• evidence-based

• incorporate a strategy for motivating potential developers and users

• revisable as science advances

25

Page 26: How to Build an Ontology

Benefits of coordination

• Can profit from lessons learned through mistakes made by others

• Can more easily reuse what is made by others

• Can more easily inspect and criticize results of others’ work

• Can more easily train people to do the necessary work

Page 27: How to Build an Ontology

BFO Top-Level Ontology

ContinuantOccurrent

(always dependent on one or more

independent continuants)

IndependentContinuant

DependentContinuant

27

Page 28: How to Build an Ontology

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity

(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Organism-Level Process

(GO)

CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

Cellular Process

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

OBO Foundry coverage

GRANULARITY

RELATION TO TIME

28

Page 29: How to Build an Ontology

List of BFO users

http://www.ifomis.org/bfo/users

29

Page 30: How to Build an Ontology

BFO UsersACGT Master Ontology (ACGT MO): represent the domain of cancer research and management in a computationally tractable manner

AFO Foundational Ontology

Biomedical Ethics Ontology

Biomedical Grid Terminology (BiomedGT): open, collaboratively developed terminology for translational research

BioTop: A Biomedical Top-Domain Ontology

BIRNLex: controlled terminology for annotation of BIRN data sources

Cell Cycle Ontology: application ontology for the representation and integrated analysis of the cell cycle process

Cell Ontology: designed as a structured controlled vocabulary for cell types

Chemical Entities of Biological Interest (ChEBI): freely available dictionary of molecular entities focused on .small. chemical compounds

Cognitive Paradigm Ontology

Common Anatomy Reference Ontology (CARO): anatomical structures in all organisms

Drug Interaction Ontology (DIO): ontology-driven inferences of possible drug-drug Interactions

Dynamic Earth Sciences Ontologies: Process and Event Ontologies

Environment Ontology: an ontology that supports the annotation of the environment of any organism or biological sample

Evolution Ontology (EO)

FlyBase: enhancing Drosophila Gene Ontology annotations

Foundational Model of Anatomy (FMA): structure of the mammalian and in particular the human body (Further reading)

Gene Ontology (GO): attributes of gene products in all organisms

Infectious Disease Ontology at the Duke University Medical Center

Information Artifact Ontology (IAO)

Interdisciplinary Prostate Ontology Project (IPOP)

Lipid Ontology

medicognos: medical knowledge and workflow management framework with integrated DSS for quality, safety and disease management applications

MIRO and IRbase: IT Tools for the Epidemiological Monitoring of Insecticide Resistance in Mosquito Disease Vectors

Nanoparticle Ontology (NPO): Ontology for Cancer Nanotechnology Research

Neuroscience Information Framework Standard (NIFSTD) Ontology: a collection of OWL modules covering distinct domains of biomedical reality

Neural Electromagnetic Ontologies (NEMO): Ontology-based Tools for Representation and Integration of Event-related Brain Potentials

Ontology of Clinical Research (OCRe)

Ontology-Based eXtensible Data Model (OBX)

Ontology of Data Mining Investigationsi (OntoDM)

Ontology for Biomedical Investigations (OBI): design, protocol, instrumentation, and analysis applied in biomedical investigations

Ontology for General Medical Science (OGMS)

Ontology of Biomedical Reality for the pathology domain of spine (scoliosis domen) (OBR-Scolio)

Petrochemical Ontology

Phenotypic Quality Ontology (PaTO): qualities of biomedical entities

Proteomics data and process provenance ontology (ProPreO): bioinformatics for glycan expression, integrated technology resource for biomedical glycomics

Protein Ontology (PRO): protein types and modifications classified on the basis of evolutionary relationships

RNA Ontology (RnaO): RNA features, interactions and motifs

Senselab Ontology with applications to NeuronDB and BrainPharm

Sequence Ontology (SO): features and properties of nucleic sequences

Sleep Domain Ontology

Subcellular Anatomy Ontology (SAO) of NCMIR

Translaftional Medicine Ontology

Vaccine Ontology (VO)

yOWL: ontology-driven knowledge base for yeast biologists

Zebrafish Anatomical Ontology (ZAO): anatomical structures in D. rerio

30

Page 31: How to Build an Ontology

How to build an ontology• import BFO into Protégé• work with domain experts to create an initial mid-

level classification• find ~50 most commonly used terms

corresponding to types in reality• arrange these terms into an informal is_a

hierarchy according to the principle• A is_a B every instance of A is an instance of B• fill in missing terms to give a complete hierarchy• (leave it to domain experts to populate the lower

levels of the hierarchy)31

Page 32: How to Build an Ontology

Example: The Cell Ontology

Page 33: How to Build an Ontology

Basic distinction among entities

type vs. instance

(science text vs. diary)

(human being vs. Tom Cruise)

(science diagram vs. photograph)33

Page 34: How to Build an Ontology

Terms in ontologies denote types (‘universals’)

it is generalizations that are important = types, types,

kinds, species

34

Page 35: How to Build an Ontology

A 515287 DC3300 Dust Collector Fan

B 521683 Gilmer Belt

C 521682 Motor Drive Belt

Catalog vs. inventory

35

Page 36: How to Build an Ontology

types vs. instances

36

Page 37: How to Build an Ontology

names of instances

37

Page 38: How to Build an Ontology

names of types

38

Page 39: How to Build an Ontology

An ontology is a representation of types

We learn about types in reality from looking at the results of scientific experiments in the form of scientific theories

experiments relate to what is particular science describes what is general

39

Page 40: How to Build an Ontology

siamese

mammal

cat

organism

objecttypes

animal

frog

instances40

Page 41: How to Build an Ontology

Ontologies are here

41

Page 42: How to Build an Ontology

or here

42

Page 43: How to Build an Ontology

Ontologies represent general structures in reality (leg)

43

Page 44: How to Build an Ontology

Ontologies do not represent concepts in people’s heads

44

Page 45: How to Build an Ontology

They represent types in reality

45

Page 46: How to Build an Ontology

Inventory vs. Catalog:Two kinds of representational

artifact

Databases represent instances

Ontologies represent types

46

Page 47: How to Build an Ontology

How do we know which general terms designate types?

Types are repeatables:

cell, electron, weapon, F16, citizen, refugee, ...

Instances are one-off: Bill Clinton, this laptop

47

Page 48: How to Build an Ontology

BFO Top-Level Ontology

ContinuantOccurrent

(always dependent on one or more

independent continuants)

IndependentContinuant

DependentContinuant

48

Page 49: How to Build an Ontology

Two kinds of entities

occurrents (processes, events, happenings)

continuants (objects, qualities, states...)

49

Page 50: How to Build an Ontology

You are a continuant

Your life is an occurrent

You are 3-dimensional

Your life is 4-dimensional

50

Page 51: How to Build an Ontology

BFO Top-Level Ontology

ContinuantOccurrent

(always dependent on one or more

independent continuants)

IndependentContinuant

DependentContinuant

51

Page 52: How to Build an Ontology

Dependent entities

require independent continuants as their bearers

There is no run without a runner

There is no grin without a cat

52

Page 53: How to Build an Ontology

Dependent vs. independent continuants

Independent continuants (organisms, buildings, environments)

Dependent continuants (quality, shape, role, propensity, function, status, power, right)

53

Page 54: How to Build an Ontology

All occurrents are dependent entities

They are dependent on those independent continuants which are their participants (agents, patients, media ...)

54

Page 55: How to Build an Ontology

Principle of Low Hanging Fruit

Include even absolutely trivial assertions (assertions you know to be universally true)

pneumococcal bacterium is_a bacterium

Computers need to be led by the hand

55

Page 56: How to Build an Ontology

Principle of singular nouns

Terms in ontologies represent types

Goal: Each term in an ontology should represent exactly one type

Thus every term should be a singular noun

56

Page 57: How to Build an Ontology

MeSH

MeSH Descriptors Index Medicus Descriptor Anthropology, Education, Sociology and Social Phenomena (MeSH Category) Social Sciences Political Systems National Socialism

National Socialism is_a Political SystemsNational Socialism is_a Anthropology ...

57

Page 58: How to Build an Ontology

Principle: distinguish use from mention

mouse =def. common name for the species mus musculus

swimming is healthy and has eight letters

58

Page 59: How to Build an Ontology

How to avoid the use-mention confusion

Avoid confusing between words and things

Avoid confusing between concepts in our minds and entities in reality

Recommendation: avoid the word ‘concept’ entirely

59

Page 60: How to Build an Ontology

Three Levels

L3. Words, models (published representations, ontologies, databases ...)

L2. Ideas (thoughts, memories, ...)

L1. Things (cells, planets, processes of cell division ...)

60

Page 61: How to Build an Ontology

‘Heparin therapy’ is an instance of ‘written or spoken designation of a concept’

What are the problems here?

1. misuse of quotation marks

2. confusion of instances and types

3. confusion of concept and reality

Trialbank

61

Page 62: How to Build an Ontology

Principle: Avoid mass nouns

Brenda Tissue Ontology

blood is_a hematopoietic system

hematopoietic system is_a whole body

whole_body is_a animal

62

Page 63: How to Build an Ontology

Count vs. mass nouns

Count

suitcase

cow

datum

Mass

luggage

beef

information

63

Page 64: How to Build an Ontology

Principle of definitions

Supply definitions for every term

1.human-understandable natural language definition

2.an equivalent formal definition

64

Page 65: How to Build an Ontology

Principle: definitions must be unique

Each term should have exactly one definition

it may have both natural-language and formal versions

(issue with ontologies which exist with different levels of expressivity)

65

Page 66: How to Build an Ontology

The Problem of Circularity

A Person =def. A person with an identity document

Hemolysis =def. The causes of hemolysis

66

Page 67: How to Build an Ontology

Principle of non-circularity

The term defined should not appear in its own definition

67

Page 68: How to Build an Ontology

Principle of Aristotelian definitions

Use Aristotelian definitions

An A is a B which C’s.

A human being is an animal which is rational

68

Page 69: How to Build an Ontology

Principle of increase in understandability

A definition should use only terms which are easier to understand than the term defined

Definitions should not make simple things more difficult than they are

69

Page 70: How to Build an Ontology

HL7

‘stopping a medication’ = def.

change of state in the record of a Substance Administration Act from Active to Aborted

70

Page 71: How to Build an Ontology

Univocity Terms should have the same meanings on

every occasion of use.

(= They should refer to the same types)

Basic ontological relations such as is_a and part_of should be used in the same way by all ontologies

71

Page 72: How to Build an Ontology

Universality: the all-some rule

Ontologies are made of relational assertions

They should include only those relational assertions which hold universally

Cell membrane part_of cell

72

Page 73: How to Build an Ontology

universality

Often, order will matter:

We can assert

adult transformation_of child

but not

child transforms_into adult

73

Page 74: How to Build an Ontology

universality

viral pneumonia caused by virus

but not

virus causes pneumonia

pneumococcal virus causes pneumonia

74

Page 75: How to Build an Ontology

Principle of Universality

results analysis later_than protocol-design

but not

protocol-design earlier_than results analysis

75

Page 76: How to Build an Ontology

Principle of positivityComplements of types are not themselves types.

Terms such as

non-mammal non-membrane other metalworker in New Zealand

do not designate types in reality

76

Page 77: How to Build an Ontology

Avoid conjunctive and disjunctive combinations

There are no conjunctive and disjunctive types:

anatomic structure, system, or substance

musculoskeletal and connective tissue disorder

77

Page 78: How to Build an Ontology

Principle: Don’t confuse ontology and epistemology

Which types exist in reality is not a function of our knowledge.

Terms such as

unknown

unclassified

unlocalized

arthropathies not otherwise specified

do not designate types in reality.78

Page 79: How to Build an Ontology

Principle: Don’t confuse ontology and epistemology

If you want to say that

We do not know where A’s are located

do not invent a new class of

A’s with unknown locations

(A well-constructed ontology should grow linearly; it should not need to delete classes or relations because of increases in knowledge)

79

Page 80: How to Build an Ontology

If you want to say

I surmise that this is a case of pneumonia

do not invent a new class of surmised pneumonias

Confusion of ‘findings’ in medical terminologies

Principle: Don’t confuse ontology and epistemology

80

Page 81: How to Build an Ontology

is_a Overloading

The success of ontology alignment demands that ontological relations (is_a, part_of, ...) have the same meanings in the different ontologies to be aligned.

81

Page 82: How to Build an Ontology

Principle: is_a should always mean is a subtype of

John is_a human being

biological process is_a Gene Ontology (old GO)

Achham cattle breed is_a organism (SNOMED)

82

Page 83: How to Build an Ontology

Multiple Inheritance

thing

carblue thing

blue car

is_a1 is_a2

83

Page 84: How to Build an Ontology

How to solve this problem

Create two ontologies:

of cars

of colors

Link the two together via cross-products

(= factoring, normalization, modularization)

84

Page 85: How to Build an Ontology

Compositionality

The meanings of compound terms should be determined

1. by the meanings of component terms

together with

2. the rules governing syntax

85

Page 86: How to Build an Ontology

Single Inheritance

No kind in a classificatory hierarchy should be asserted to have more than one is_a parent on the immediate higher level

86

Page 87: How to Build an Ontology

Multiple Inheritance

thing

carblue thing

blue car

is_a is_a

87

Page 88: How to Build an Ontology

Multiple Inheritance

is a source of errors

encourages laziness

serves as obstacle to integration with neighboring ontologies

hampers use of Aristotelian methodology for defining terms

hampers use of statistical search tools

88

Page 89: How to Build an Ontology

Multiple Inheritance

thing

carblue thing

blue car

is_a1 is_a2

89

Page 90: How to Build an Ontology

Principle of asserted single inheritance

Each reference ontology module should be built as an asserted monohierarchy (a hierarchy in which each term has at most one parent)

Asserted hierarchy vs. inferred hierarchy

90

Page 91: How to Build an Ontology

Principle of normalization

Polyhierarchies should be decomposable into homogeneous disjoint monohierarchies

91

Page 92: How to Build an Ontology

Principle of instantiation

A term should be included in an ontology only if there is evidence that instances to which that term refers exist in reality.

92

Page 93: How to Build an Ontology

Why do we need rules/standards for good ontology?

Ontologies must be intelligible both to humans (for annotation and curation) and to machines (for reasoning and error-checking): the lack of rules for classification leads to human error and blocks automatic reasoning and error-checking

Intuitive rules facilitate training of curators and annotators

Common rules allow alignment with other ontologies

93

Page 94: How to Build an Ontology

Ontology path dependence principle

The decisions made by the creators of an ontology – including those decisions which pertain to the ontology’s upper-level architecture – should as far as possible be made on the basis of the degree to which they advance the consistency of that ontology with the reference ontologies already existing in relevant domains.

94

Page 95: How to Build an Ontology

User feedback principle

An ontology should evolve on the basis of feedback derived from those who are using the ontology, for example for purposes in annotation.

95