© university of manchester 1 ontology normalisation, pre- and post- coordination alan rector &...
TRANSCRIPT
© University of Manchester 1
Ontology Normalisation,Ontology Normalisation,Pre- and Post- CoordinationPre- and Post- Coordination
Alan Rector & CO-ODE/NIBHI
University of Manchester
[email protected] GALEN
BioHealthInformaticsGroup
© University of Manchester
© University of Manchester 2
When to use a classifierWhen to use a classifier1. At author time: As a compiler – Pre-coordination
► Ontologies will be delivered as “pre-coordinated” ontologies to be used without a reasoner
► To make extensions and additions quick, easy, and responsive, distri but developments, empower users to make changes
► Part of an ontology life cycle
2. At delivery time: As a service: - Post-coordination► Many fixed ontologies are too big and too small
► Too big to find things; too small to contain what you need
► Create them on the fly
► Part of an ontology service
3. At application time: as a reasoner - Post-coordination & inference1. Decision support, query optimisation, schema integration, …, …, …
2. Part of a reasoning service
© University of Manchester 3
When to use a classifier 1:When to use a classifier 1:Pre-coordinated deliveryPre-coordinated deliveryclassifier as compilerclassifier as compiler
► The life cycle
► Gather requirements, sketch, experiment
► Establish patterns – design a “language”
► Criteria for success: What a subject domain expert can learn in a few days
► Bulk authoring
► Classification
► Quality assurance
► Commit classifier results to a pre-coordinated ontology & deliver
► Polyhierarchies (Protégé, DAG-Edit, OWL-Lite, RDF(S), Topic Maps, …
► Query and use with you favourite tool
Development & evolution
© University of Manchester 4
Commit Results to a Pre-Commit Results to a Pre-Coordinated OntologyCoordinated Ontology
Assert (“Commit”) changes inferred by classifier
© University of Manchester 5
Precoordinated OntologiesPrecoordinated Ontologies
► Author
► Classify
► Export to OWL-Lite or RDF
► Query with RDQL, SparQL, or your favourite RDF/RDF(S) query tool
► Use as a vocabulary in your database
© University of Manchester 6
The problemThe problem
But we want to
►Build ontologies cooperatively with different groups
►Extend ontologies smoothly
►Re-use pieces of ontologies
►Build new ontologies on top of old
►Quit starting from scratch
Knowledge is Big, Fractal & Changeable!
© University of Manchester 7
Assertion: Assertion:
►Let the ontology authors ► create discrete modules
► describe the links between modules
►Let the logic reasoner► Organise the result
The arrival of logic-based The arrival of logic-based ontologies/OWL gives new ontologies/OWL gives new opportunities to make ontologies opportunities to make ontologies more manable and modularmore manable and modular
© University of Manchester 8
Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego
hand
extremity
body
acute
chronic
abnormalnormal
ischaemicdeletion
bacterialpolymorphism
cell
protein
gene
infectioninflammation
Lung
expression
© University of Manchester 9
Logic-based Ontologies: Logic-based Ontologies: Conceptual LegoConceptual Lego
“SNPolymorphism of CFTRGene causing Defect in MembraneTransport of Chloride Ion causing Increase in Viscosity of Mucus in CysticFibrosis…”
“Hand which isanatomically normal”
© University of Manchester 10
Logical Constructs Logical Constructs build complex build complex concepts from concepts from modularisedmodularisedprimitivesprimitives
GenesSpecies
Protein
Function
Disease
Protein coded bygene in humans
Function ofProtein coded bygene in humans
Disease caused by abnormality inFunction ofProtein coded bygene in humans
Gene in humans
© University of Manchester 11
Normalising (untangling) Normalising (untangling) OntologiesOntologies
StructureFunction
Part-wholeStructure Function
Part-w
hole
© University of Manchester 12
Rationale for Normalisation Rationale for Normalisation ► Explicit distinctions between modules
► Primitives are opaque to the reasoner
►Information implicit in primitive names cannot contribute to modularisation
► Primitives are indivisible to both human and reasoner
►Each primitive should represent a single notion
► Therefore, each primitive must belong to exactly one module
►If a primitive belongs to two modules, they are not modular.
►If a primitive belongs to two modules, it probably conflates two notions
► Therefore concentrate on the “primitive skeleton” of the domain ontology
© University of Manchester 13
Normalisation and Normalisation and UntanglingUntanglingLet the reasoner do multiple classificationLet the reasoner do multiple classification
► Tree► Everything has just one parent
►A ‘strict hierarchy’
► Directed Acyclic Graph (DAG)► Things can have multiple parents
►A ‘Polyhierarchy’
► Normalisation► Separate primitives into disjoint trees
► Link the trees with restrictions
►Fill in the values
© University of Manchester 14
Developing a normalised Developing a normalised ontology from scratchontology from scratch
►The goal for the example ontology is to build an ontology of animals to index a children’s book of animals
© University of Manchester 15
Steps in developing an OntologySteps in developing an Ontology
1. Establish the purpose► Without purpose, no scope, requirements, evaluation,
2. Informal/Semiformal knowledge elicitation► Collect the terms
► Organise terms informally
► Paraphrase and clarify terms to produce informal concept definitions
► Diagram informally
3. Refine requirements & tests
© University of Manchester 16
Steps in implementing an OntologySteps in implementing an Ontology
4. Implementation► Paraphrase and comment at each stage before implementing
► Develop normalised schema and skeleton
► Implement prototype recording the intention as a paraphrase► Keep track of what you meant to do so you can compare with what happens
► Implementing logic-based ontologies is programming
► Scale up a bit► Check performance
► Populate► Possibly with help of text mining and language technology
5. Evaluate & quality assure ► Against goals
► Include tests for evolution and change management
► Design regression tests and “probes”
6. Monitor use and evolve
► Process not product!
© University of Manchester 17
Purpose & scope of the Purpose & scope of the animals ontologyanimals ontology
► To provide an ontology for an index of a children’s book of animals including
► Where they live
► What they eat
►Carnivores, herbivores and omnivores
► How dangerous they are
► How big they are
► A bit of basic anatomy
►numbers of legs, wings, toes, etc.
© University of Manchester 18
Collect the conceptsCollect the concepts
► Card sorting is often the best way► Write down each concept/idea on a card
► Organise them into piles
► Link the piles together
► Do it again, and again
► Works best in a small group
► In the lab we will provide you with some pre-printed cards and many spare cards
►Work in pairs or triples
© University of Manchester 19
Example: Animals & PlantsExample: Animals & Plants
► Dog
► Cat
► Cow
► Person
► Tree
► Grass
► Herbivore
► Male
► Female
► Dangerous
► Pet
► Domestic Animal
► Farm animal
► Draft animal
► Food animal
► Fish
► Carp
► Goldfish
► Carnivore
► Plant
► Animal
► Fur
► Child
► Parent
► Mother
► Father
© University of Manchester 20
Organise the conceptsOrganise the conceptsExample: Animals & PlantsExample: Animals & Plants
► Dog
► Cat
► Cow
► Person
► Tree
► Grass
► Herbivore
► Male
► Female
► Healthy
► Pet
► Domestic Animal
► Farm animal
► Draft animal
► Food animal
► Fish
► Carp
► Goldfish
► Carnivore
► Plant
► Animal
► Fur
► Child
► Parent
► Mother
► Father
© University of Manchester 21
Extend the conceptsExtend the concepts“Laddering”“Laddering”
► Take a group of things and ask what they have in common
► Then what other ‘siblings’ there might be
► e.g.
► Plant, Animal Living Thing
► Might add Bacteria and Fungi but not now
► Cat, Dog, Cow, Person Mammal
► Others might be Goat, Sheep, Horse, Rabbit,…
► Cow, Goat, Sheep, Horse Hoofed animal (“Ungulate”)
► What others are there? Do they divide amongst themselves?
► Wild, Domestic Demoestication
► What other states – “Feral” (domestic returned to wild)
© University of Manchester 22
Choose some main axesChoose some main axes
►Add abstractions where needed► e.g. “Living thing”
► identify relations► e.g. “eats”, “owns”, “parent of”
► Identify definable things► e.g. “child”, “parent”, “Mother”, “Father”
►Things where you can say clearly what it means
►Try to define a dog precisely – very difficult
►A “natural kind”
► make names explicit
© University of Manchester 23
Choose some main axesChoose some main axesAdd abstractions where needed; identify relations; Add abstractions where needed; identify relations; Identify definable things, make names explicitIdentify definable things, make names explicit
► Living Thing► Animal
► Mammal
► Cat
► Dog
► Cow
► Person
► Fish
► Carp
► Goldfish
► Plant
► Tree
► Grass
► Fruit
► Modifiers► domestic
► pet
► Farmed
► Draft
► Food
► Wild
► Health► healthy
► sick
► Sex► Male
► Female
► Age► Adult
► Child
Definable Carinvore Herbivore Child Parent Mother Father Food Animal Draft Animal
Relations eats owns parent-of …
© University of Manchester 24
Self_standing_entitiesSelf_standing_entities
►Things that can exist on there own nouns►People, animals, houses, actions, processes, …
►Roughly nouns
►Modifiers►Things that modify (“inhere”) in other things
►Roughly adjectives and adverbs
© University of Manchester 25
Reorganise everything but “definable” things into Reorganise everything but “definable” things into pure trees – these will be the pure trees – these will be the “primitives”“primitives”
► Self_standing► Living Thing
► Animal
►Mammal
► Cat
► Dog
► Cow
► Person
► Pig
►Fish
► Carp Goldfish
► Plant
►Tree
►Grass
►Fruit
► Modifiers► Domestication
► Domestic
► Wild
► Use► Draft
► Food
► pet
► Risk
► Dangerous
► Safe
► Sex
► Male
► Female
► Age
► Adult
► Child
Definables Carnivore Herbivore Child Parent Mother Father Food Animal Draft Animal
Relations eats owns parent-of …
© University of Manchester 26
If anything needs clarifying, If anything needs clarifying, add a text commentadd a text comment
► Self_standing► Living Thing
► Animal
► Mammal
► Cat
► Dog
► Cow
► Person
► Pig
► Fish
► Carp Goldfish
► Plant
► Tree
► Grass
► Fruit
– The abstract ancestor concept including all living things – restrict to plants and animals for now”
© University of Manchester 27
Identify the domain and range Identify the domain and range constraints for constraints for propertiesproperties
► Animal eats Living_thing
► eats domain: Animal; range: Living_thing
► Person owns Living_thing except person
► owns domain: Person range: Living_thing & not Person
► Living_thing parent_of Living_thing
► parent_of: domain: Animal range: Animal
© University of Manchester 28
If anything is used in a special way,If anything is used in a special way,add a text commentadd a text comment
► Animal eats Living_thing
► eats domain: Animal; range: Living_thing
— ignore difference betweenparts of living thingsand living thingsalso derived from livingthings
© University of Manchester 29
For For definable thingsdefinable things► Paraphrase and formalise the definitions in terms of the primitives, relations
and other definables.
► Note any assumptions to be represented elsewhere.
► Add as comments when implementing
► “A ‘Parent’ is an animal that is the parent of some other animal” (Ignore plants for now)
► Parent = Animal and parent_of some Animal
► “A ‘Herbivore’ is an animal that eats only plants”(NB All animals eat some living thing)
► Herbivore= Animal and eats only Plant
► “An ‘omnivore’ is an animal that eats both plants and animals”► Omnivore=
Animal and eats some Animal and eats some Plant
© University of Manchester 30
Paraphrases and Paraphrases and CommentsComments
► Write down the paraphrases and put them in the comment space.
► We can show you how to make the comment space bigger to make it easier.
► Without a paraphrase, we cannot tell if we disagree on
► What you meant to represent
► How you represented it
► Without a paraphrase we will mark down by at least half and give no partial credit
► We will try to understand what you are doing, but we cannot read your minds.
© University of Manchester 31
Which properties can be filled inWhich properties can be filled inat the class level now?at the class level now?
►What can we say about all members of a class?
►eats
►All cows eat some plants
►All cats eat some animals
►All pigs eat some animals & eat some plants
© University of Manchester 32
Fill in the detailsFill in the details(can use property matrix wizard)(can use property matrix wizard)
© University of Manchester 33
Check with classifierCheck with classifier
►Cows should be Herbivores►Are they? why not?
►What have we said?
►Cows are animals and, amongst other things, eat some grass and eat some leafy_plants
►What do we need to say:Closure axiom
►Cows are animals and, amongst other things,eat some plants and eat only plants
►(See “Vegetarian Pizzas” in OWL tutorial)
© University of Manchester 34
Closure AxiomClosure Axiom
► Cows are animals and, amongst other things,eat some plants and eat only plants
ClosureAxiom
© University of Manchester 35
Open vs Closed World Open vs Closed World reasoningreasoning
► Open world reasoning► Negation as contradiction
► Anything might be true unless it can be proven false
► Reasoning about any world consistent with this one
► Closed world reasoning► Negation as failure
► Anything that cannot be found is false
► Reasoning about this world
► Ontologies are not databases
© University of Manchester 36
Tables are easier to manage Tables are easier to manage than DAGs / than DAGs / PolyhierarchiesPolyhierarchies
…and get the benefit of inference:Grass and Leafy_plants are both kinds of Plant
© University of Manchester 37
Remember to add any Remember to add any closure axiomsclosure axioms
ClosureAxiom
Then let the reasoner do the work
© University of Manchester 38
Normalisation:Normalisation:From Trees to DAGsFrom Trees to DAGs
Before classification A tree
After classification A DAG
Directed Acyclic Graph
© University of Manchester 39
Untangling and EnrichmentUntangling and EnrichmentUsing a classifier to make life easierUsing a classifier to make life easier
Substance- Protein- - ProteinHormone- - - Insulin- Steroid- - SteroidHormone- - - Cortisol- Hormone- -ProteinHormone- - - Insulin- - SteroidHormone- - - Cortisol- Catalyst- - Enzyme- - - ATPase
- PhsioloicRole- - HormoneRole- - CatalystRole
- Substance- - Protein- - - Insulin- - - ATPase- Steroid- - Cortisol
Hormone Substance & playsRole-someValuesFrom HormoneRole
ProteinHormone Protein & playsRole someValuesFrom HormoneRole
SteroidHomone Steroid & playsRole someValuesFrom HormoneRole
Catalyst Substance & playsRole someValuesFrom CatalystRole
Enzyme Protein & playsRole someValuesFrom CatalystRole
Insulin playsRole someValuesFrom HormoneRole
Cortisol playsRole someValuesFrom HormoneRole
ATPase playsRole someValuesFrom CatalystRole
Substance- Protein- - ProteinHormone- - - Insulin- - Enzyme- - - ATPase- Steroid- - SteroidHomone^- - - Cortisol-Hormone- - ProteinHormone^- - - Insulin^- - SteroidHormone^- - - Cortisol^- Catalyst- - Enzyme^- - - ATPase^
© University of Manchester 40
The original tangled The original tangled ontologyontology
© University of Manchester 41
Modularised into structure Modularised into structure and function ontologies and function ontologies (all primitive)(all primitive)
© University of Manchester 42
Unified by an ontology that Unified by an ontology that imports both structure and imports both structure and function ontologies and function ontologies and adds linking definitionsadds linking definitions
Before classification - primitives in yellow - defined in orange;prefixes indicate imported ontology
© University of Manchester 43
Definition of HormoneDefinition of Hormone
© University of Manchester 44
Unified ontology after Unified ontology after classificationclassification
© University of Manchester 45
Normalisation: Criterion 1Normalisation: Criterion 1The skeleton should consist The skeleton should consist of disjoint treesof disjoint trees
► Every primitive concept should have exactly one primitive parent
► All multiple hierarchies the result of inference by reasoner
© University of Manchester 46
Normalisation Criterion 2:Normalisation Criterion 2:No hidden changes of meaning No hidden changes of meaning
►Each branch should be homogeneous and logical(“Aristotelian”)
► Hierarchical principle should be subsumption
►Otherwise we are “lying to the logic”
► The criteria for differentiation should follow consistent principles in each branch eg. structure XOR function XOR cause
© University of Manchester 47
A Non-homogeneous A Non-homogeneous taxonomytaxonomy
“On those remote pages it is written that animals are divided into:
a. those that belong to the Emperor b. embalmed ones c. those that are trained d. suckling pigse. mermaids f. fabulous ones g. stray dogs h. those that are included in this classificationi. those that tremble as if they were mad j. innumerable ones k. those drawn with a very fine camel's hair brush l. others m. those that have just broken a flower vase n. those that resemble flies from a distance" From The Celestial Emporium of Benevolent
Knowledge, Borges
© University of Manchester 48
Normalisation Criterion 3Normalisation Criterion 3Distinguish “Self-standing” and Distinguish “Self-standing” and “Refining” Concepts“Refining” Concepts
►Self-standing concepts►Roughly Welty & Guarino’s “sortals”
►person, idea, plant, committee, belief,…
►Refining concepts – depend on self-standing concepts
►mild|moderate|severe, hot|cold, left|right,…
► Roughly Welty & Guarino’s non-sortals
► Closely related to Smith’s “fiat partitions”
► Usefully thought of as Value Types by engineers
►For us an engineering distinction…
© University of Manchester 49
Normalisation Criterion 3aNormalisation Criterion 3aSelf-standing primitives should be Self-standing primitives should be globallyglobally disjoint & open disjoint & open
►Primitives are atomic ► If primitives overlap, the overlap conceals implicit information
►A list of self-standing primitives can never be guaranteed complete► How many kinds of person? of plant? of committee? of belief?
► Can’t infer: Parent & ¬sub1 &…& ¬subn-1 subn
►Heuristic: ► Diagnosis by exclusion about self-standing concepts should NOT
be part of ‘standard’ ontological reasoning
© University of Manchester 50
Normalisation Criterion 3bNormalisation Criterion 3bRefining primitives should be Refining primitives should be locallylocally disjoint & closeddisjoint & closed
► Individual values must be disjoint
► but can be hierarchical
►e.g. “very hot”, “moderately severe”
► Each list can be guaranteed to be complete
► Can infer Parent & ¬sub1 &…& ¬subn-1 subn
► Value types themselves need not be disjoint► “being hot” is not disjoint from “being severe”
►Allowing Valuetypes to overlap is a useful trick, e.g.
►restriction has_state someValuesFrom (severe and hot)
© University of Manchester 51
Normalisation Criterion 4Normalisation Criterion 4AxiomsAxioms
► No axiom should denormalise the ontology
No axiom should imply that a primitive is part of more than one branch of primitive skeleton
► If all primitives are disjoint, any such axioms will make that primitive unsatisfiable
►A partial test for normalisation:
►Create random conjunctions of primitives which do not subsume each other.
►If any are satisfiable, the ontology is not normalised
© University of Manchester 52
Consequences 1Consequences 1
►All self-standing primitives are disjoint
►All multiple classification is inferred
►For any two primitive self-standing classes, either one subsumes the other or they are disjoint
►Every self standing concept is part of exactly one primitive branch of the skeleton
►Every self-standing concept has exactly one most specific primitive ancestor
© University of Manchester 53
Consequences 2Consequences 2
► Primitives introduced by a conjunction of one class and a boolean combination of zero or more restrictions
► Tree subclass-of Plant and restriction isMadeOf someValuesFrom Wood
► Resort subclass-of Accommodation restriction isIntendedFor someValueFrom Holidays
© University of Manchester 54
Consequences 3Consequences 3►Use of axioms limited (outside of skeleton
construction)The following are a safe but not exhaustive guide:
►The right side of subclass axioms limited to restrictions
►Both sides of disjointness axioms limited to restrictions
►No equivalence axioms with primitives on either side
© University of Manchester 55
A real example: A real example: Build a simple treeeBuild a simple treee
easy to maintaineasy to maintain
© University of Manchester 56
Let the classifier organise it Let the classifier organise it
© University of Manchester 57
If you want more If you want more abstractions,abstractions,just add new definitionsjust add new definitions(re-use existing data)(re-use existing data)
“Diseases linked to abnormal proteins”
© University of Manchester 58
And let the classifier work And let the classifier work againagain
© University of Manchester 59
And again – even for a quite And again – even for a quite different category different category
“Diseases linked genes described in the mouse”
© University of Manchester 60
And let classifier check And let classifier check consistency consistency (My first try wasn’t)(My first try wasn’t)
© University of Manchester 61
Summary: Why Normalise? Summary: Why Normalise? Why use a Classifier?Why use a Classifier?
► To compose concepts
► Allow conceptual lego
► To manage polyhierarchies
► Adding abstractions (“axes”) as needed
► Normalisation
► Untangling
► labelling of “kinds of is-a”
► To avoid combinatorial explosions
► Keep bicycles from exploding
► To manage context
► Cross species, Cross disciplines, Cross studies
► To check consistency and help users find errors
© University of Manchester 62
But remember the ‘ontology’ is just one But remember the ‘ontology’ is just one part of an information resourcepart of an information resource
► The ontology ► Naming - Definitions & necessary conditions
► Classification
► Indexing
► Knowledge bases► What we know about those entities – what is true in general
► Databases► What we know about individuals
► Instance stores – specialised databases that link to ontologies
► Plus► Lexicons► Metadata► Mappings