building a suite of biomedical ontologies

72
Building a Suite of Biomedical Ontologies Barry Smith 1

Upload: modesta-curtis

Post on 31-Dec-2015

34 views

Category:

Documents


1 download

DESCRIPTION

Building a Suite of Biomedical Ontologies. Barry Smith. Problems with UMLS-style approaches. let a million ontologies bloom, each one close to the terminological habits of its authors in concordance with the “not invented here” syndrome - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building a Suite of  Biomedical Ontologies

Building a Suite of Biomedical Ontologies

Barry Smith

1

Page 2: Building a Suite of  Biomedical Ontologies

Problems with UMLS-style approaches

• let a million ontologies bloom, each one close to the terminological habits of its authors

• in concordance with the “not invented here” syndrome

• then map these ontologies, and use these mappings to integrate your different pots of data

2

Page 3: Building a Suite of  Biomedical Ontologies

Mappings are hardThey create an N2 problem; are fragile, and expensive to

maintainNeed new authorities to maintain(one for each pair of

mapped ontologies), yielding new risk of forking – who will police the mappings?

The goal should be to minimize the need for mappings, by avoiding redundancy in the first place – one ontology for each domain

Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible

3

Page 4: Building a Suite of  Biomedical Ontologies

How to do it right?• how create an incremental, evolutionary process,

where what is good survives, and what is bad fails• where the number of ontologies needing to be

used together is small – integration = addition• where these ontologies are stable• by creating a scenario in which people will find it

profitable to reuse ontologies, terminologies and coding systems which have been tried and tested

4

Page 5: Building a Suite of  Biomedical Ontologies

Modularity

modularity ensures • annotations can be additive• division of labor amongst domain experts• high value of training in any given module• lessons learned in one module can benefit

work on other modules• incentivization of those responsible for

individual modules

5

Page 6: Building a Suite of  Biomedical Ontologies

Reasons why GO has been successful

It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists

Based on community consensusUpdated every nightClear versioning principles ensure backwards

compatibility; prior annotations do not lose their value

Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though still proceeding caution)

6

Page 7: Building a Suite of  Biomedical Ontologies

GO has learned the lessons of successful cooperation

• Clear documentation• The terms chosen are already familiar• Fully open source (allows thorough testing in

manifold combinations with other ontologies)• Subjected to considerable third-party critique• Tracker for user input and help desk with rapid

turnaround

7

Page 8: Building a Suite of  Biomedical Ontologies

GO has been amazingly successful in overcoming the data balkanization

problembut it covers only generic biological entities of three sorts:

– cellular components– molecular functions– biological processes

no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …

8

Page 9: Building a Suite of  Biomedical Ontologies

How create a disease ontology?

• One option: a flat list• One option: template approach

– Cancer– Infectious Disease– Diabetes– Autoimmune Disease

• To make this work: think very hard about what a disease is

9

Page 10: Building a Suite of  Biomedical Ontologies

Aristotelian definitions

• To define a term ‘A’ in an ontology identify the parent term ‘B’ and start your definition:

• An A is a B which … Cs ….

A = speciesB = genusC = differentia

10

Page 11: Building a Suite of  Biomedical Ontologies

• Cancer disease is a disease which …• Genetic disease is a disease which …• Infectious disease is a disease which …

11

Page 12: Building a Suite of  Biomedical Ontologies

Information Artifact

Ontology(IAO)

Ontology for Biomedical

Investigations(OBI)

Ontology of General Medical Science (OGMS)

Basic Formal Ontology (BFO)

12

Page 13: Building a Suite of  Biomedical Ontologies

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)

Subcellular Anatomy Ontology (SAO)Sequence Ontology

(SO*) Molecular Function

(GO*)Protein Ontology(PRO*) OBO Foundry Modular Organization

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical Investigations

(OBI)

Ontology of General Medical Science

(OGMS)

Basic Formal Ontology (BFO)

13

Page 14: Building a Suite of  Biomedical Ontologies

Ontology for General Medical Science

http://code.google.com/p/ogms/

(OBO) http://purl.obolibrary.org/obo/ogms.obo

(OWL) http://purl.obolibrary.org/obo/ogms.owl

14

Page 15: Building a Suite of  Biomedical Ontologies

OGMS-based initiatives

Vital Signs Ontology (VSO)

EHR / Demographics Ontology

Infectious Disease Ontology (IDO)

Psychology Ontology (PSY)

Emotion Ontology (PSY-EM)

Genetic Disease Ontology

Cancer Ontology

15

Page 16: Building a Suite of  Biomedical Ontologies

BFO: the very top

Continuant Occurrent(Process, Event)

IndependentContinuant

DependentContinuant

16

Page 17: Building a Suite of  Biomedical Ontologies

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

17

Page 18: Building a Suite of  Biomedical Ontologies

BFO & GO

continuant occurrent

biological processes

independentcontinuant

cellular component

dependentcontinuant

molecular function

18

Page 19: Building a Suite of  Biomedical Ontologies

Basic Formal Ontology

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......

types

instances19

Page 20: Building a Suite of  Biomedical Ontologies

Experience with BFO in building ontologies provides

• a community of skilled ontology developers and users (user group has 120 members)

• associated logical tools • documentation for different types of users• a methodology for building conformant

ontologies by starting with BFO and populating downwards

20

Page 21: Building a Suite of  Biomedical Ontologies

Example: The Cell Ontology

Page 22: Building a Suite of  Biomedical Ontologies

How to build an ontologyimport BFO into ontology editor such as Protégé

work with domain experts to create an initial mid-level classification

find ~50 most commonly used terms corresponding to types in reality

arrange these terms into an informal is_a hierarchy according to this universality principle

A is_a B every instance of A is an instance of B

fill in missing terms to give a complete hierarchy

(leave it to domain experts to populate the lower levels of the hierarchy)

22

Page 23: Building a Suite of  Biomedical Ontologies

Basic Formal Ontology

continuant occurrent

independentcontinuant

dependentcontinuant

organism

23

Page 24: Building a Suite of  Biomedical Ontologies

Continuants

• continue to exist through time, preserving their identity while undergoing different sorts of changes

• independent continuants – objects, things, ...

• dependent continuants – qualities, attributes, shapes, potentialities ...

24

Page 25: Building a Suite of  Biomedical Ontologies

Occurrents

• processes, events, happenings– your life– this process of accelerated cell

division

25

Page 26: Building a Suite of  Biomedical Ontologies

Qualitiestemperatureblood pressuremass...

are continuantsthey exist through time while undergoing changes

26

Page 27: Building a Suite of  Biomedical Ontologies

Qualitiestemperature / blood pressure /

mass ...are dimensions of variation within the structure of the entitya quality is something which can change while its bearer remains one and the same

27

Page 28: Building a Suite of  Biomedical Ontologies

A Chart representing how John’s temperature

changes

28

Page 29: Building a Suite of  Biomedical Ontologies

A Chart representing how John’s temperature

changes

29

Page 30: Building a Suite of  Biomedical Ontologies

John’s temperature,the temperature he has throughout his entire life, cycles through different determinate temperatures from one time to the next

John’s temperature in thus changing, exerts an influence on other dimensions of variation in the physiology of the organism through time

30

Page 31: Building a Suite of  Biomedical Ontologies

BFO: The Very Top

continuant

independentcontinuant

dependentcontinuant

quality

occurrent

temperature 31

Page 32: Building a Suite of  Biomedical Ontologies

Blinding Flash of the Obvious

independentcontinuant

dependentcontinuant

quality

temperature types

instances

organism

John John’s

temperature 32

Page 33: Building a Suite of  Biomedical Ontologies

Blinding Flash of the Obvious

independentcontinuant

dependentcontinuant

quality

temperature types

instances

organism

John John’s

temperature 33

Page 34: Building a Suite of  Biomedical Ontologies

Blinding Flash of the Obvious

temperature types

instances

organism

John John’s

temperature .inheres_in

34

Page 35: Building a Suite of  Biomedical Ontologies

temperature types

instances

John’s temperature

37ºC37.1º

C37.5º

C37.2º

C37.3º

C37.4º

C

instantiates at t1

instantiates at t2

instantiates at t3

instantiates at t4

instantiates at t5

instantiates at t6

35

Page 36: Building a Suite of  Biomedical Ontologies

human types

instances

John

embryo

fetus adultneonat

einfant child

instantiates at t1

instantiates at t2

instantiates at t3

instantiates at t4

instantiates at t5

instantiates at t6

36

Page 37: Building a Suite of  Biomedical Ontologies

whole plant continuants

occurrents37

zygote

pro-embry

o

mature whole plant

globular

embryo

bilateral

embryo...

becomes reproductivel

y able

fertili-zation

first cell

division

Page 38: Building a Suite of  Biomedical Ontologies

child transformation_of fetus

38

Page 39: Building a Suite of  Biomedical Ontologies

Temperature subtypesDevelopment-stage

subtypes

are threshold divisions (hence we do not have sharp boundaries, and we have a certain degree of choice, e.g. in how many subtypes to distinguish, though not in their ordering)

39

Page 40: Building a Suite of  Biomedical Ontologies

independentcontinuant

dependentcontinuant

quality

temperature types

instances

organism

John John’s

temperature

40

Page 41: Building a Suite of  Biomedical Ontologies

independentcontinuant

dependentcontinuant

quality

temperature

organism

John John’s

temperature

occurrent

process

course of temperature

changes

John’s temperature history

41

Page 42: Building a Suite of  Biomedical Ontologies

independentcontinuant

dependentcontinuant

quality

temperature

organism

John John’s

temperature

occurrent

process

life of an organism

John’s life

42

Page 43: Building a Suite of  Biomedical Ontologies

BFO: The Very Top

continuant occurrent

independentcontinuant

dependentcontinuant

quality disposition

43

Page 44: Building a Suite of  Biomedical Ontologies

BFO: The Very Top

continuant

independentcontinuant

dependentcontinuant

qualityfunctionroledisposition

occurrent

44

Page 45: Building a Suite of  Biomedical Ontologies

disposition- of a glass vase, to shatter if dropped- of a human, to eat - of a banana, to ripen- of John, to lose hair

45

Page 46: Building a Suite of  Biomedical Ontologies

dispositionif it ceases to exist, then its bearer and/or its immediate surrounding environment is physically changedits realization occurs when its bearer is in some special physical circumstancesits realization is what it is in virtue of the bearer’s physical make-up

46

Page 47: Building a Suite of  Biomedical Ontologies

function - of liver: to store glycogen- of birth canal: to enable transport- of eye: to see- of mitochondrion: to produce ATP

not optional; reflection of physical makeup of bearer; subtype of disposition

47

Page 48: Building a Suite of  Biomedical Ontologies

independentcontinuant

dependentcontinuant

function

to seeeye

John’s eye function of John’s eye: to see

occurrent

process

process of seeing

John seeing

48

Page 49: Building a Suite of  Biomedical Ontologies

OGMSOntology for General Medical

Science

http://code.google.com/p/ogms

49

Page 50: Building a Suite of  Biomedical Ontologies

Physical Disorder

50

Page 51: Building a Suite of  Biomedical Ontologies

:.

Physical Disorder

– independent continuantfiat object part

A causally linked combination of physical components of the extended organism that is clinically abnormal.

51

Page 52: Building a Suite of  Biomedical Ontologies

Clinically abnormal

– (1) not part of the life plan for an organism of the relevant type (unlike aging or pregnancy),

– (2) causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and

– (3) such that the elevated risk exceeds a certain threshold level.*

*Compare: baldness52

Page 53: Building a Suite of  Biomedical Ontologies

Big Picture

53

Page 54: Building a Suite of  Biomedical Ontologies

Pathological Process=def. A bodily process that is a manifestation of a disorder and is clinically abnormal.

Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.

54

Page 55: Building a Suite of  Biomedical Ontologies

Cirrhosis - environmental exposure

• Etiological process - phenobarbitol-induced hepatic cell death– produces

• Disorder - necrotic liver– bears

• Disposition (disease) - cirrhosis– realized_in

• Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death– produces

• Abnormal bodily features– recognized_as

• Symptoms - fatigue, anorexia• Signs - jaundice, enlarged spleen

55

Page 56: Building a Suite of  Biomedical Ontologies

Dispositions and Predispositions

All diseases are dispositions; not all dispositions are diseases.

Predisposition to Disease

=def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing some disease.

56

Page 57: Building a Suite of  Biomedical Ontologies

HNPCC - genetic pre-disposition• Etiological process - inheritance of a mutant mismatch repair gene

– produces• Disorder - chromosome 3 with abnormal hMLH1

– bears• Disposition (disease) - Lynch syndrome

– realized_in• Pathological process - abnormal repair of DNA mismatches

– produces• Disorder - mutations in proto-oncogenes and tumor suppressor genes with

microsatellite repeats (e.g. TGF-beta R2)– bears

• Disposition (disease) - non-polyposis colon cancer– realized in

• Symptoms (including pain)

57

Page 58: Building a Suite of  Biomedical Ontologies

Huntington’s Disease – genetic disease

• Etiological process - inheritance of >39 CAG repeats in the HTT gene– produces

• Disorder - chromosome 4 with abnormal mHTT– bears

• Disposition (disease) - Huntington’s disease– realized_in

• Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum– produces

• Abnormal bodily features– recognized_as

• Symptoms - anxiety, depression• Signs - difficulties in speaking and

swallowing

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out Huntington’s suggests

Laboratory tests produces

Test results - molecular detection of the HTT gene with >39CAG repeats used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease

58

Page 59: Building a Suite of  Biomedical Ontologies

Cirrhosis - environmental exposure

• Etiological process - phenobarbitol-induced hepatic cell death

– produces

• Disorder - necrotic liver

– bears

• Disposition (disease) - cirrhosis

– realized_in

• Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death

– produces

• Abnormal bodily features

– recognized_as

• Symptoms - fatigue, anorexia

• Signs - jaundice, splenomegaly

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out cirrhosis suggests

Laboratory tests produces

Test results - elevated liver enzymes in serum used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease cirrhosis

59

Page 60: Building a Suite of  Biomedical Ontologies

Systemic arterial hypertension

• Etiological process – abnormal reabsorption of NaCl by the kidney

– produces

• Disorder – abnormally large scattered molecular aggregate of salt in the blood

– bears

• Disposition (disease) - hypertension

– realized_in

• Pathological process – exertion of abnormal pressure against arterial wall

– produces

• Abnormal bodily features

– recognized_as

• Symptoms - headaches, dizziness

• Signs – elevated blood pressure

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out hypertension suggests

Laboratory tests produces

Test results - used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease hypertension

60

Page 61: Building a Suite of  Biomedical Ontologies

Type 2 Diabetes Mellitus• Etiological process –

– produces• Disorder – abnormal pancreatic beta

cells and abnormal muscle/fat cells– bears

• Disposition (disease) – diabetes mellitus– realized_in

• Pathological processes – diminished insulin production , diminished muscle/fat uptake of glucose

– produces• Abnormal bodily features

– recognized_as• Symptoms – polydipsia, polyuria,

polyphagia, blurred vision• Signs – elevated blood glucose and

hemoglobin A1c

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out diabetes mellitus suggests

Laboratory tests – fasting serum blood glucose, oral glucose challenge test, and/or blood hemoglobin A1c produces

Test results - used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease type 2 diabetes mellitus

61

Page 62: Building a Suite of  Biomedical Ontologies

Type 1 hypersensitivity to penicillin• Etiological process – sensitizing of mast

cells and basophils during exposure to penicillin-class substance

– produces• Disorder – mast cells and basophils with

epitope-specific IgE bound to Fc epsilon receptor I

– bears• Disposition (disease) – type I

hypersensitivity– realized_in

• Pathological process – type I hypersensitivity reaction

– produces• Abnormal bodily features

– recognized_as• Symptoms – pruritis, shortness of breath• Signs – rash, urticaria, anaphylaxis

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - suggests

Laboratory tests – produces

Test results – occasionally, skin testing used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease type 1 hypersensitivity to penicillin

62

Page 63: Building a Suite of  Biomedical Ontologies

63

Page 64: Building a Suite of  Biomedical Ontologies

Disease vs. Disease course

Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.

Disease course =def. – The aggregate of processes in which a disease disposition is realized.

64

Page 65: Building a Suite of  Biomedical Ontologies

coronary heart disease

John’s coronary heart disease

disease associated

with asymptomatic

(‘silent’) infarction

disease associated with early

lesions and small fibrous

plaques

stable angina

disease associated

with surface disruption of plaque

unstable angina

instantiates at t1

instantiates at t2

instantiates at t3

instantiates at t4

instantiates at t5

time65

Page 66: Building a Suite of  Biomedical Ontologies

independentcontinuant

dependentcontinuant

disposition

diseasedisorder

John’s disordered

heart

John’s coronary heart

disease

occurrent

process

course of disease

course of John’s disease

66

Page 67: Building a Suite of  Biomedical Ontologies

OGMS IDO

Independent Continuant

DisorderInfectious disorder

Dependent Continuant

Disease

Predisposition to disease

Infectious disease

Protective resistance

Occurrent Disease courseInfectious

disease course

Examples of ontology terms

Page 68: Building a Suite of  Biomedical Ontologies

IDO (Infectious Disease Ontology) CoreFollows GO strategy of providing a

canonical ontology of what is involved in every infectious disease – host, pathogen, vector, virulence, vaccine, transmission – accompanied by IDO Extensions for specific diseases, pathogens and vectorsProvides common terminology resources and tested common guidelines for a vast array of different disease communities

68

Page 69: Building a Suite of  Biomedical Ontologies

Infectious Disease Ontology Consortium• MITRE, Mount Sinai, UTSouthwestern –

Influenza• IMBB/VectorBase – Vector borne diseases

(A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)

• Colorado State University – Dengue Fever• Duke University – Tuberculosis, Staph.

aureus• Cleveland Clinic – Infective Endocarditis• University of Michigan – Brucellosis• Duke University, University at Buffalo – HIV

69

Page 70: Building a Suite of  Biomedical Ontologies

Influenza - infectious

• Etiological process - infection of airway epithelial cells with influenza virus

– produces

• Disorder - viable cells with influenza virus

– bears

• Disposition (disease) - flu

– realized_in

• Pathological process - acute inflammation

– produces

• Abnormal bodily features

– recognized_as

• Symptoms - weakness, dizziness

• Signs - fever 70

Page 71: Building a Suite of  Biomedical Ontologies

Influenza – disease course

• Etiological process - infection of airway epithelial cells with influenza virus

– produces

• Disorder - viable cells with influenza virus

– bears

• Disposition (disease) - flu

– realized_in

• Pathological process - acute inflammation

– produces

• Abnormal bodily features

– recognized_as

• Symptoms - weakness, dizziness

• Signs - fever 71

The disorder also induces normal physiological processes (immune response) that can results in the elimination of the disorder (transient disease course).

Page 72: Building a Suite of  Biomedical Ontologies

Big Picture

72