http://img.cs.man.ac.uk/stevens 1 building and using ontologies robert stevens department of...

25
http://img.cs.man.ac.uk/ stevens 1 Building and Using Ontologies Robert Stevens Department of Computer Science University of Manchester Manchester UK

Post on 18-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

http://img.cs.man.ac.uk/stevens 1

Building and Using Ontologies

Robert StevensDepartment of Computer Science

University of ManchesterManchester UK

http://img.cs.man.ac.uk/stevens 2

Introduction

• The nature of bioinformatics resources• What is knowledge?• What is an ontology?• What are the uses of ontologies?• Components of an ontology• Building an ontology (in brief)

http://img.cs.man.ac.uk/stevens 3

The Nature of Bioinformatics Resources

• Over 500 databanks and analysis tools that work over resources

• Repositories of knowledge and data and generation of new knowledge

• Knowledge often held as free text; some use made of controlled vocabularies

• Enormous amount of semantic heterogeneity and poor query facilities

• Knowledge about services not always apparent

http://img.cs.man.ac.uk/stevens 4

What is Knowledge?

• Knowledge – all information and an understanding to carry out tasks and to infer new information

• Information -- data equipped with meaning

• Data -- un-interpreted signals that reach our senses PATRICIAGRACEKENNEDY

SAIDMINEISAPINT

Patricia Grace Kennedy said mine is a pint

name noun verb

Pat Baker is a Manchester bioinformatician who drinks beer.

…CEKENN…Single letter amino acid codesC – cysteineK - lysine

Protein that acts as a tyrosine kinase inthe liver of primates.

http://img.cs.man.ac.uk/stevens 5

Capturing Knowledge

• Capturing knowledge for both humans an computer applications

• A set of vocabulary definitions that capture a community’s knowledge of a domain

• `An ontology may take a variety of forms, but necessarily it will include a vocabulary of terms, and some specification of their meaning. This includes definitions and an indication of how concepts are inter-related which collectively impose a structure on the domain and constrain the possible interpretations of terms.'

http://img.cs.man.ac.uk/stevens 6

What Does an Ontology Do?

• Captures knowledge• Creates a shared understanding – between

humans and for computers• Makes knowledge machine processable• Makes meaning explicit – by definition and

context

http://img.cs.man.ac.uk/stevens 7

What is an Ontology?

Catalog/ID

GeneralLogical

constraints

Terms/glossary

Thesauri“narrower

term”relation Formal

is-aFrames

(properties)

Informalis-a

Formalinstance

Value Restrs. Disjointness, Inverse, part-

of…

http://img.cs.man.ac.uk/stevens 8

Roles of Ontologies in Bioinformatics

• We can divide ontology use into three types:• Domain-oriented, which are either domain specific (e.g.

E. coli) or domain generalisations (e.g. gene function or ribosomes);

• Task-oriented, which are either task specific (e.g. annotation analysis) or task generalisations (e.g. problem solving);

• Generic, which capture common high level concepts, such as Physical, Abstract and Substance. Important in ontology management and language applications.

http://img.cs.man.ac.uk/stevens 9

Uses of Ontology

• Community reference -- neutral authoring. • Either defining database schema or defining a common

vocabulary for database annotation -- ontology as specification.

• Providing common access to information. Ontology-based search by forming queries over databases.

• Understanding database annotation and technical literature.

• Guiding and interpreting analyses and hypothesis generation

http://img.cs.man.ac.uk/stevens 10

Components of an Ontology

• Concepts: Class of individuals – The concept Protein and the individual `human cytochrome C’

• Relationships between concepts• Is a kind of relationship forms a taxonomy• Other relationships give further structure – is a

part of• Axioms – Disjointness, covering, equivalence,…

http://img.cs.man.ac.uk/stevens 11

Knowledge Representation• Ontology are best delivered in some computable

representation• Variety of choices with different:

– Expressiveness• The range of constructs that can be used to formally,

flexibly, explicitly and accurately describe the ontology

– Ease of use– Computational complexity

• Is the language computable in real time?

Rigour -- Satisfiability and consistency of the representation• Systematic enforcement mechanisms

– Unambiguous, clear and well defined semantics

http://img.cs.man.ac.uk/stevens 12

Languages• Vocabularies using natural language

– Hand crafted, flexible but difficult to evolve, maintain and keep consistent, with weak semantics

– Gene Ontology

• Object-based KR: frames– Extensively used, good structuring, intuitive. Semantics

defined by OKBC standard– EcoCyc (uses Ocelot) and RiboWeb (uses Ontolingua)

• Logic-based: Description Logics– Very expressive, model is a set of theories, well defined

semantics– Automatic derived classification taxonomies– Concepts are defined and primitive

http://img.cs.man.ac.uk/stevens 13

Building Ontologies

• No field of Ontological Engineering equivalent to Knowledge or Software Engineering;

• No standard methodologies for building ontologies;• Such a methodology would include:

– a set of stages that occur when building ontologies; – guidelines and principles to assist in the different stages; – an ontology life-cycle which indicates the relationships among

stages.

http://img.cs.man.ac.uk/stevens 14

The Development Lifecycle• Two kinds of complementary methodologies emerged:

– Stage-based, e.g. TOVE [Uschold96] – Iterative evolving prototypes, e.g. MethOntology [Gomez Perez94].

• Most have TWO stages:1. Informal stage

• ontology is sketched out using either natural language descriptions or some diagram technique

2. Formal stage • ontology is encoded in a formal knowledge representation language, that is

machine computable

– the informal representation helps the former – the formal representation helps the latter.

http://img.cs.man.ac.uk/stevens 15

A Provisional Methodology• A skeletal methodology and life-cycle for building

ontologies;• Inspired by the software engineering V-process model;

• The overall process moves through a life-cycle.

The left side charts the processes in building an ontology

The right side charts the guidelines, principles and evaluation used to ‘quality assure’ the ontology

http://img.cs.man.ac.uk/stevens 16

The V-model Methodology

Conceptualisation

Integrating existing ontologies

Encoding

Representation

Identify purpose and scope

Knowledge acquisition

Evaluation: coverage, verification, granularity

Conceptualisation Principles: commitment, conciseness, clarity, extensibility, coherency

Encoding/Representation principles: encoding bias, consistency, house styles and standards, reasoning system exploitation

Ontology in Use

User Model

Conceptualisation Model

Implementation Model

http://img.cs.man.ac.uk/stevens 17

The ontology building life-cycle

Identify purpose and scope

Knowledge acquisition

Evaluation

Language and representation

Available development tools

Conceptualisation

Integrating existing ontologiesEncoding

Building

http://img.cs.man.ac.uk/stevens 18

Starting Concept List

• Chemicals – atom, ion, molecule, compound, element;• Molecular-compound, ionic-compound, ionic-molecular-

compound, …;• Ionic-macromolecular-compound and ionic-small-

macromolecular-compound;• Protein, peptide, polyprotein, enzyme, holoprotein,

apoprotein,…• Nucleic acid – DNA, RNA, tRNA, mRna, snRNA, …

http://img.cs.man.ac.uk/stevens 19

Conceptualisation SketchChemical

AtomElementCompoundMolecule Ion

MetalNon-Metal

Metaloid

Molecular Compound

Molecular Element

Ionic Compound

Ionic Molecule

Ionic Molecular Compound

http://img.cs.man.ac.uk/stevens 20

Molecule Conceptualisation Sketch

NucleicAcid

ProteinPolysaccharide

DNA RNAEnzyme

Macromolecule SmallMolecule

Ionic MacromolecularCompound

Starch Glycogen

mRNA tRNA rRNAsnRNA

Peptide

http://img.cs.man.ac.uk/stevens 21

Initial Encoding

class-def chemical

subclass-of substance

class-def molecule

subclass-of chemical

class-def compound

subclass-of chemical

class-def molecular-compound

subclass-of molecule and compound

http://img.cs.man.ac.uk/stevens 22

Molecules Revisited

NucleicAcid

ProteinPolysaccharide

DNA RNAEnzyme

Macromolecule SmallMolecule

Ionic MacromolecularCompound

Starch Glycogen

mRNA tRNA rRNAsnRNA

Peptide

Non-Ionic MacromolecularCompound

http://img.cs.man.ac.uk/stevens 23

More Encoding

class-def chemical

subclass-of substance

class-def defined molecule

subclass-of chemical

Slot-constraint contains-bond min-cardinality 1 has-value covalent-bond

class-def defined compound

subclass-of chemical

Slot-constraint has-atom-types greater-than 1

class-def defined molecular-compound

subclass-of molecule and compound

http://img.cs.man.ac.uk/stevens 24

Expansion

• Sketch and encode in cycles• Build a taxonomy of a small portion• Then build links to other portions• Add more detail• Document sources, author, date and

argumentation.

http://img.cs.man.ac.uk/stevens 25

Summary

• An ontology captures knowledge for a shared understanding

• The important question is not whether an artefact is an ontology, but whether it does any good

• Making our understanding of domain explicit, consistent and processable

• Bioinformatics resources are knowledge resources – needs to be both human and machine understandable