information grids, the semantic web & why ontologies matter professor carole goble university of...

48
Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Upload: cecil-henry

Post on 19-Jan-2016

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Information Grids, the Semantic Web &Why Ontologies Matter

Professor Carole GobleUniversity of ManchesterUK

Page 2: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Comparative Functional Genomics

Vast amounts of data & escalating

Highly heterogeneous Data types Data forms Community

Highly complex and inter-related

Volatile

Page 3: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Take home messageContent with extensive meta data

Services that exploit this enriched content

Knowledge

Fundamentally involves the construction and deployment of ontologies

Ontologies on a Grid scale need reasoning support

Page 4: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Agents Web Services

Grid Computing

e-Business

e-Science

?

Page 5: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Semantic interoperation

objectstransportpacket

data linkphysical

metamodelsontologies

views/queriesprocess

objectstransportpacket

data linkphysical

metamodelsontologies

views/queriesprocess

Dataexchange

Semantic interoperation

Page 6: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Web Service

Descriptions => Automated Discovery & Search Selection Matching Composition & Interoperation Invocation Execution monitoring

Page 7: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

What to describe?

Resource Service

Service profile

Service model

Service grounding

provides

presents

describedby

supports

What it does

How it works

How to access itdescription

functionalitiesfunctional attributes

Page 8: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

The Tower of BabelInteroperating resources, be it by people or systems, requires a consistent shared understanding of what the information contained means

“... people [and machines] can’t share knowledge if they don’t speak a common language”

(Davenport)

Page 9: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Metadata Data describing the content and

meaning of resources But everyone must speak the same

language…

Page 10: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Terminologies Shared and common vocabularies For search engines, agents, curators,

authors and users But everyone must mean the same

thing…

Page 11: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Ontologies Shared and common understanding of a

domain Essential for search, exchange and

discovery

Page 12: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Machine processable Knowledge on the Web Annotating services requires a shared

vocabulary Ontologies :

a vocabulary of terms, a precise and principled specification of their

meaning structure on the domain of the terms constrain the possible interpretations of terms

Inference applies the knowledge in the metadata and the ontology to create new metadata and new knowledge

Page 13: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Three Layer Orthodoxy(Schreiber et al. 1998)

Knowledge LayerMining:

inference, prediction & discovery

Information LayerMiddleware & Metadata:

discovery, description, interoperation, association, sharing, composition, personalisation

Data / Computation Layer

Page 14: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

What is an Ontology?

Catalog/ID

GeneralLogical

constraints

Terms/glossary

Thesauri“narrower

term”relation

Formalis-a

Frames(properties)

Informalis-a

Formalinstance Value

Restrs.

Disjointness, Inverse, part-of…

From Debbie McGuinness

Page 15: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Ontology desiderata Precision

Formal, unambiguous

High fidelity Explicitness

Clarity Commitment Reuse

Systematic Quality Clarity

Flexibility Expressivity Evolution

machine computable

Page 16: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Ontology Description Space

Expressivity

Coverage

Knowledge representational languages

Inference mechanisms

Taxonomy, Relationships, Axioms

Page 17: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

What do Ontologies offer?

Controlled description and organisational framework Controlled vocabularies Accurate data collection or retrieval Classification Finding, sharing, discovering,

navigation, indexing

Control

Page 18: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

ID PRIO_HUMAN STANDARD; PRT; 253 AA.DE MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) (ASCR).OS Homo sapiens (Human).OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.CC -!- FUNCTION: THE FUNCTION OF PRP IS NOT KNOWN. PRP IS ENCODED IN THE HOST GENOME AND IS CC EXPRESSED BOTH IN NORMAL AND INFECTED CELLS.CC -!- SUBUNIT: PRP HAS A TENDENCY TO AGGREGATE YIELDING POLYMERS CALLED "RODS".CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR.CC -!- DISEASE: PRP IS FOUND IN HIGH QUANTITY IN THE BRAIN OF HUMANS AND ANIMALS INFECTED WITH CC NEURODEGENERATIVE DISEASES KNOWN AS TRANSMISSIBLE SPONGIFORM ENCEPHALOPATHIES OR PRION CC DISEASES, LIKE: CREUTZFELDT-JAKOB DISEASE (CJD), GERSTMANN-STRAUSSLER SYNDROME (GSS), CC FATAL FAMILIAL INSOMNIA (FFI) AND KURU IN HUMANS; SCRAPIE IN SHEEP AND GOAT; BOVINE CC SPONGIFORM ENCEPHALOPATHY (BSE) IN CATTLE; TRANSMISSIBLE MINK ENCEPHALOPATHY (TME); CC CHRONIC WASTING DISEASE (CWD) OF MULE DEER AND ELK; FELINE SPONGIFORM ENCEPHALOPATHY CC (FSE) IN CATS AND EXOTIC UNGULATE ENCEPHALOPATHY(EUE) IN NYALA AND GREATER KUDU. THE CC PRION DISEASES ILLUSTRATE THREE MANIFESTATIONS OF CNS DEGENERATION: (1) INFECTIOUS (2)CC SPORADIC AND (3) DOMINANTLY INHERITED FORMS. TME, CWD, BSE, FSE, EUE ARE ALL THOUGHT TO CC OCCUR AFTER CONSUMPTION OF PRION-INFECTED FOODSTUFFS.CC -!- SIMILARITY: BELONGS TO THE PRION FAMILY.KW Prion; Brain; Glycoprotein; GPI-anchor; Repeat; Signal; Polymorphism; Disease mutation.

Controlled Vocabularies

Page 19: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

StructuralGenomics

Population Genetics

Genome sequence

Functional genomics Tissue

Clinical trial

Disease

Clinical Data

Data resources have been built introspectively for human researchers

Information is machine readable not machine understandable

Sharing vocabulary is a step towards unification

Page 20: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

What do Ontologies offer?

Community reference model Common framework for integration

OpenMMS, TAMBIS Search support: querying and matching Information extraction PASTA Information checking Irbane Intelligent interfaces for queries and

accurate data capture

Control +Semantics

Page 21: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Quality: reap what you sow

"The problem is: the databases are God-awful. … If the data is still fundamentally flawed, then better algorithms add little.”

Temple Smith, Director

Molecular Engineering Research Center Boston University

Page 22: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

The Web Services Stack

XML

HTTP

TCP/IP

SOAP

WSDL

UDDI

Transport

Message syntax

Message protocol

Service connection

Adverts: Description and discovery

WFDLWorkflowOntology

Page 23: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

What do Ontologies offer?

Knowledge discovery Knowledge-acquisition tools Decision Support Hypothesis generation RiboWeb,

Ingenuity

Control + Semantics + Inference

Page 24: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

“The technical advantages of knowledge modeling are obvious. Knowledge bases can be automatically checked for consistency; they support inference mechanisms which derive data which have not been explicitly stored; they also offer extensive request and navigation facilities. However, the most immediate benefit of knowledge base design lies in the modeling process itself, through the effort of explication, organization and structuration [sic] of the knowledge it requires.”

Editorial, Bioinformatics, July 2000

Page 25: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Scale => Reasoning & Inference

1. Keeping the classification together2. Expressing constraints and sticking to ‘em Ontology design

Creation, extension, maintenance Large, multiply authored evolving ontologies

Ontology integration Merging

Ontology deployment Determining consistency of description & instances Query validation/refinement/containment & Service

matching

Page 26: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Ontologies are the cornerstone of encoding understanding, BUT to be shared they they need a standard representation and exchange language

Page 27: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Requirements for an Ontology-language (1)

Well designed Useful and proven modelling primitives Intuitive to human users Can say simple things simply but as

complex as necessary Expressive enough to capture many

ontologies Efficient, sound and complete

reasoning support

Page 28: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Requirements for an Ontology-language (2)

Well defined clear syntax - read ontologies Formal semantics – understand

(process) ontologies - to facilitate machine interpretation of that semantics;

Expressive enough to capture many ontologies

Page 29: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Requirements for an Ontology-language (3)

Compatible Easy mapping to/from other ontology

languages Maximum compatibility with XML and

RDF(S);

Page 30: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

The Ontology Language Stack

OIL

HTML XML + Name Space + XML Schema

Topic Maps

SMIL

RDF(S)

DC PICS

XOL

DAML-Ont

DAML+OIL

RDF

DAML-R

DAML-S

Unicode URI

Page 31: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

DAML+OIL Ontology language

Logic with model theoretic semantics Classes, properties & axioms OIL -> frame syntax mapped to description

logic Web

Mapping to RDF(S) Decidable and empirically tractable Tools: editors (OilEd) reasoners (FaCT) Extensions: DAML-R, DAML-S

Page 32: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

DAML-S Upper Ontology

Resource Service

Service profile

Service model

Service grounding

provides

presents

describedby

supports

What it does

How it works

How to access itdescription

functionalitiesfunctional attributes

Page 33: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

The Semantic Web

Page 34: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Knowledge Technologies for the Grid• Ability to store and retrieve huge volumes of data • Ability to effectively process large volumes of data

• Ability to capture, enrich, classify and structure knowledge about

•Domains•Organisations•Individuals•Research Collaborations•Experiments•Results

•Services

Page 35: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Places to go www.semanticweb.org www.daml.org www.ontoweb.org

www.bioontologies.org

Page 36: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Spares

Page 37: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Three Layer Orthodoxy(Schreiber et al. 1998)

Knowledge LayerKnowledge is the whole body of data and information that people bring to bear to practical use in action, in order to carry out tasks and to create & infer new information.”

Information LayerInformation is data equipped with meaning…

Data / Computation LayerData is the uninterpreted signals that reach our senses

every minute in time by the zillions…

Page 38: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Where I’m coming from

MyGridPersonalised extensible environments fordata-intensive “in silico” experimentsin biology

m

Page 39: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Description Logics: formal semantics &

automated reasoning support

Web languages:XML & RDF based syntax, RDFS mapping

Originallybased on XOL from the BioOntology Working Group

A knowledge representation language and inference mechanism for the web

OIL: Ontology Inference LayerFrames:

modelling primitives, OKBC-Lite

Page 40: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Slot-def part-of subslot-of structural-relation inverse has-part properties transitive

Class-def defined herbivore subclass-of animal slot-constraint eats value-type plant OR slot-constraint part-of has-value plant min-cardinality 2 vegetable

Disjoint herbivore carnivore

part-of is a slot sub-slot of structural-relation inverse is has-part it is transitive

herbivore exactly defined as: sub-class of animal that eats only plants or parts of plants and >= 2 types of vegetable

herbivore and carnivore disjoint

OIL example

Page 41: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Reasoning when querying? Classification-based retrieval

Query generalisation Query refinement

Reasoning about query descriptors Query validation Query organisation Query inclusion/containment Intensional query processing

Page 42: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Knowledge Ontologies can use not

just for retrieval but for discovery

Middleware Metadata

To describe the information and computational resources

Essential for navigation, integration, analysis, use

Data

Information

Knowledge

Page 43: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Tools

Ontology development environments

Ontology application servers

Metadata extractors & annotators

E-Science agents Personalisation agents Ontology learning tools

Change management tools

Semantic Retrieval tools

Semantic Web Portal builders

Semantic Authoring tools

Ontology and metadata visualisation

Intelligent browsers

Page 44: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Why Reasoning support?

Page 45: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Using the Tbox as a big Index In interfaces

Most specific reasonable assertions Most specific data entry forms for some condition for some

kind of patient In mediation

Most specific wrapper function of a resource Most specific codes in an external system for some

concept In retrieval

Most specific interactions for two drugs Most specific web pages for some topic Most specific bibliographic references for some problem

In decision support…

Page 46: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Mark up Web Services to make them Computer interpretable Use-apparent Agent-ready

Declarative API Capturing data & metadata associated with a

source Specification of its properties & capabilities Interface for its execution Pre-requisites and consequences of its use

Page 47: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

Warehousing, distributed databases, streaming, near-line storage, large objects, efficient access mechanisms, data staging, query optimisation…

Data

Information

Knowledge

Metadata, middleware, fusion, intelligent retrieval, information modelling, curation management, semi-automatic annotation, data warehousing, workflow, information/content distribution, active content management (distribution, security…), consistency management (versioning, quality…),

Mining, visualisation, knowledge management, reasoning & prediction…

Grid = Infrastructure

Page 48: Information Grids, the Semantic Web & Why Ontologies Matter Professor Carole Goble University of Manchester UK

What’s special about Bioinformatics?

Complexity

Diversity

size isn’t everything

DiseaseDisease

DiseaseDrug

Disease

Clinical trialPhenotype

ProteinProtein

Structure

Protein Sequence P-P interactions

Proteome

Gene sequenceGenome

sequence

Gene expressionGene

expression

a+b