information grids, the semantic web & why ontologies matter professor carole goble university of...
TRANSCRIPT
Information Grids, the Semantic Web &Why Ontologies Matter
Professor Carole GobleUniversity of ManchesterUK
Comparative Functional Genomics
Vast amounts of data & escalating
Highly heterogeneous Data types Data forms Community
Highly complex and inter-related
Volatile
Take home messageContent with extensive meta data
Services that exploit this enriched content
Knowledge
Fundamentally involves the construction and deployment of ontologies
Ontologies on a Grid scale need reasoning support
Agents Web Services
Grid Computing
e-Business
e-Science
?
Semantic interoperation
objectstransportpacket
data linkphysical
metamodelsontologies
views/queriesprocess
objectstransportpacket
data linkphysical
metamodelsontologies
views/queriesprocess
Dataexchange
Semantic interoperation
Web Service
Descriptions => Automated Discovery & Search Selection Matching Composition & Interoperation Invocation Execution monitoring
What to describe?
Resource Service
Service profile
Service model
Service grounding
provides
presents
describedby
supports
What it does
How it works
How to access itdescription
functionalitiesfunctional attributes
The Tower of BabelInteroperating resources, be it by people or systems, requires a consistent shared understanding of what the information contained means
“... people [and machines] can’t share knowledge if they don’t speak a common language”
(Davenport)
Metadata Data describing the content and
meaning of resources But everyone must speak the same
language…
Terminologies Shared and common vocabularies For search engines, agents, curators,
authors and users But everyone must mean the same
thing…
Ontologies Shared and common understanding of a
domain Essential for search, exchange and
discovery
Machine processable Knowledge on the Web Annotating services requires a shared
vocabulary Ontologies :
a vocabulary of terms, a precise and principled specification of their
meaning structure on the domain of the terms constrain the possible interpretations of terms
Inference applies the knowledge in the metadata and the ontology to create new metadata and new knowledge
Three Layer Orthodoxy(Schreiber et al. 1998)
Knowledge LayerMining:
inference, prediction & discovery
Information LayerMiddleware & Metadata:
discovery, description, interoperation, association, sharing, composition, personalisation
Data / Computation Layer
What is an Ontology?
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance Value
Restrs.
Disjointness, Inverse, part-of…
From Debbie McGuinness
Ontology desiderata Precision
Formal, unambiguous
High fidelity Explicitness
Clarity Commitment Reuse
Systematic Quality Clarity
Flexibility Expressivity Evolution
machine computable
Ontology Description Space
Expressivity
Coverage
Knowledge representational languages
Inference mechanisms
Taxonomy, Relationships, Axioms
What do Ontologies offer?
Controlled description and organisational framework Controlled vocabularies Accurate data collection or retrieval Classification Finding, sharing, discovering,
navigation, indexing
Control
ID PRIO_HUMAN STANDARD; PRT; 253 AA.DE MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) (ASCR).OS Homo sapiens (Human).OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.CC -!- FUNCTION: THE FUNCTION OF PRP IS NOT KNOWN. PRP IS ENCODED IN THE HOST GENOME AND IS CC EXPRESSED BOTH IN NORMAL AND INFECTED CELLS.CC -!- SUBUNIT: PRP HAS A TENDENCY TO AGGREGATE YIELDING POLYMERS CALLED "RODS".CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR.CC -!- DISEASE: PRP IS FOUND IN HIGH QUANTITY IN THE BRAIN OF HUMANS AND ANIMALS INFECTED WITH CC NEURODEGENERATIVE DISEASES KNOWN AS TRANSMISSIBLE SPONGIFORM ENCEPHALOPATHIES OR PRION CC DISEASES, LIKE: CREUTZFELDT-JAKOB DISEASE (CJD), GERSTMANN-STRAUSSLER SYNDROME (GSS), CC FATAL FAMILIAL INSOMNIA (FFI) AND KURU IN HUMANS; SCRAPIE IN SHEEP AND GOAT; BOVINE CC SPONGIFORM ENCEPHALOPATHY (BSE) IN CATTLE; TRANSMISSIBLE MINK ENCEPHALOPATHY (TME); CC CHRONIC WASTING DISEASE (CWD) OF MULE DEER AND ELK; FELINE SPONGIFORM ENCEPHALOPATHY CC (FSE) IN CATS AND EXOTIC UNGULATE ENCEPHALOPATHY(EUE) IN NYALA AND GREATER KUDU. THE CC PRION DISEASES ILLUSTRATE THREE MANIFESTATIONS OF CNS DEGENERATION: (1) INFECTIOUS (2)CC SPORADIC AND (3) DOMINANTLY INHERITED FORMS. TME, CWD, BSE, FSE, EUE ARE ALL THOUGHT TO CC OCCUR AFTER CONSUMPTION OF PRION-INFECTED FOODSTUFFS.CC -!- SIMILARITY: BELONGS TO THE PRION FAMILY.KW Prion; Brain; Glycoprotein; GPI-anchor; Repeat; Signal; Polymorphism; Disease mutation.
Controlled Vocabularies
StructuralGenomics
Population Genetics
Genome sequence
Functional genomics Tissue
Clinical trial
Disease
Clinical Data
Data resources have been built introspectively for human researchers
Information is machine readable not machine understandable
Sharing vocabulary is a step towards unification
What do Ontologies offer?
Community reference model Common framework for integration
OpenMMS, TAMBIS Search support: querying and matching Information extraction PASTA Information checking Irbane Intelligent interfaces for queries and
accurate data capture
Control +Semantics
Quality: reap what you sow
"The problem is: the databases are God-awful. … If the data is still fundamentally flawed, then better algorithms add little.”
Temple Smith, Director
Molecular Engineering Research Center Boston University
The Web Services Stack
XML
HTTP
TCP/IP
SOAP
WSDL
UDDI
Transport
Message syntax
Message protocol
Service connection
Adverts: Description and discovery
WFDLWorkflowOntology
What do Ontologies offer?
Knowledge discovery Knowledge-acquisition tools Decision Support Hypothesis generation RiboWeb,
Ingenuity
Control + Semantics + Inference
“The technical advantages of knowledge modeling are obvious. Knowledge bases can be automatically checked for consistency; they support inference mechanisms which derive data which have not been explicitly stored; they also offer extensive request and navigation facilities. However, the most immediate benefit of knowledge base design lies in the modeling process itself, through the effort of explication, organization and structuration [sic] of the knowledge it requires.”
Editorial, Bioinformatics, July 2000
Scale => Reasoning & Inference
1. Keeping the classification together2. Expressing constraints and sticking to ‘em Ontology design
Creation, extension, maintenance Large, multiply authored evolving ontologies
Ontology integration Merging
Ontology deployment Determining consistency of description & instances Query validation/refinement/containment & Service
matching
Ontologies are the cornerstone of encoding understanding, BUT to be shared they they need a standard representation and exchange language
Requirements for an Ontology-language (1)
Well designed Useful and proven modelling primitives Intuitive to human users Can say simple things simply but as
complex as necessary Expressive enough to capture many
ontologies Efficient, sound and complete
reasoning support
Requirements for an Ontology-language (2)
Well defined clear syntax - read ontologies Formal semantics – understand
(process) ontologies - to facilitate machine interpretation of that semantics;
Expressive enough to capture many ontologies
Requirements for an Ontology-language (3)
Compatible Easy mapping to/from other ontology
languages Maximum compatibility with XML and
RDF(S);
The Ontology Language Stack
OIL
HTML XML + Name Space + XML Schema
Topic Maps
SMIL
RDF(S)
DC PICS
XOL
DAML-Ont
DAML+OIL
RDF
DAML-R
DAML-S
Unicode URI
DAML+OIL Ontology language
Logic with model theoretic semantics Classes, properties & axioms OIL -> frame syntax mapped to description
logic Web
Mapping to RDF(S) Decidable and empirically tractable Tools: editors (OilEd) reasoners (FaCT) Extensions: DAML-R, DAML-S
DAML-S Upper Ontology
Resource Service
Service profile
Service model
Service grounding
provides
presents
describedby
supports
What it does
How it works
How to access itdescription
functionalitiesfunctional attributes
The Semantic Web
Knowledge Technologies for the Grid• Ability to store and retrieve huge volumes of data • Ability to effectively process large volumes of data
• Ability to capture, enrich, classify and structure knowledge about
•Domains•Organisations•Individuals•Research Collaborations•Experiments•Results
•Services
Places to go www.semanticweb.org www.daml.org www.ontoweb.org
www.bioontologies.org
Spares
Three Layer Orthodoxy(Schreiber et al. 1998)
Knowledge LayerKnowledge is the whole body of data and information that people bring to bear to practical use in action, in order to carry out tasks and to create & infer new information.”
Information LayerInformation is data equipped with meaning…
Data / Computation LayerData is the uninterpreted signals that reach our senses
every minute in time by the zillions…
Where I’m coming from
MyGridPersonalised extensible environments fordata-intensive “in silico” experimentsin biology
m
Description Logics: formal semantics &
automated reasoning support
Web languages:XML & RDF based syntax, RDFS mapping
Originallybased on XOL from the BioOntology Working Group
A knowledge representation language and inference mechanism for the web
OIL: Ontology Inference LayerFrames:
modelling primitives, OKBC-Lite
Slot-def part-of subslot-of structural-relation inverse has-part properties transitive
Class-def defined herbivore subclass-of animal slot-constraint eats value-type plant OR slot-constraint part-of has-value plant min-cardinality 2 vegetable
Disjoint herbivore carnivore
part-of is a slot sub-slot of structural-relation inverse is has-part it is transitive
herbivore exactly defined as: sub-class of animal that eats only plants or parts of plants and >= 2 types of vegetable
herbivore and carnivore disjoint
OIL example
Reasoning when querying? Classification-based retrieval
Query generalisation Query refinement
Reasoning about query descriptors Query validation Query organisation Query inclusion/containment Intensional query processing
Knowledge Ontologies can use not
just for retrieval but for discovery
Middleware Metadata
To describe the information and computational resources
Essential for navigation, integration, analysis, use
Data
Information
Knowledge
Tools
Ontology development environments
Ontology application servers
Metadata extractors & annotators
E-Science agents Personalisation agents Ontology learning tools
Change management tools
Semantic Retrieval tools
Semantic Web Portal builders
Semantic Authoring tools
Ontology and metadata visualisation
Intelligent browsers
Why Reasoning support?
Using the Tbox as a big Index In interfaces
Most specific reasonable assertions Most specific data entry forms for some condition for some
kind of patient In mediation
Most specific wrapper function of a resource Most specific codes in an external system for some
concept In retrieval
Most specific interactions for two drugs Most specific web pages for some topic Most specific bibliographic references for some problem
In decision support…
Mark up Web Services to make them Computer interpretable Use-apparent Agent-ready
Declarative API Capturing data & metadata associated with a
source Specification of its properties & capabilities Interface for its execution Pre-requisites and consequences of its use
Warehousing, distributed databases, streaming, near-line storage, large objects, efficient access mechanisms, data staging, query optimisation…
Data
Information
Knowledge
Metadata, middleware, fusion, intelligent retrieval, information modelling, curation management, semi-automatic annotation, data warehousing, workflow, information/content distribution, active content management (distribution, security…), consistency management (versioning, quality…),
Mining, visualisation, knowledge management, reasoning & prediction…
Grid = Infrastructure
What’s special about Bioinformatics?
Complexity
Diversity
size isn’t everything
DiseaseDisease
DiseaseDrug
Disease
Clinical trialPhenotype
ProteinProtein
Structure
Protein Sequence P-P interactions
Proteome
Gene sequenceGenome
sequence
Gene expressionGene
expression
a+b