introduction to ontologies adding meaning to metadata brian lowe metadata working group february 16,...

69
Introduction to Ontologies Adding Meaning to Metadata Brian Lowe Metadata Working Group February 16, 2007

Post on 21-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Introduction to Ontologies

Adding Meaning to Metadata

Brian Lowe

Metadata Working Group

February 16, 2007

So…

what

heck

the

arewe

talking about

exactly

ontologies are really, really

simple.

Thing

Person Foodeats

Ontologies can also be really, really

complex.

We store data and metadata in all kinds of ways.

We’re probably all familiar with a database record:

Record

Record Number: 289425

Title: Metamorphosis

Type: book

Author: Kafka, Franz

Publication date: 1946

Publisher: Vanguard Press

Let’s back up a bit

We need to add another field.

RecordRecord Number: 289425 Title1: Metamorphosis

Type: book

Author: Kafka, Franz

Publication date: 1946

Publisher: Vanguard Press

What do we do when we want to express something else?

Title2: Die Verwandlung

We need to add another table.

ThingRecord Number: 289425

Type: book

Author: Kafka, Franz

Publication date: 1946

Publisher: Vanguard Press

Say we want unlimited titles.

Title289425 Metamorphosis

289425 Die Verwandlung

20027 Dr. Strangelove

Well-designed databases tend to deal with lots of relationships

between different elements of data.

The way the relationships are set up is called the data model

Relational databases are great.

Until you want to share your data with someone else who isn’t running the same database software or who doesn’t understand what you’ve done.

OK, no problem. Why don’t we just create a standardized way of shipping data around.

Let’s call this standard XML.

<?xml version=“1.0” encoding=“UTF-8”?>

<things>

<thing id=“289425”>

<title>Metamorphosis</title>

<title noindex=“4”>Die Verwandlung</title>

<author>Kafka, Franz</author>

<publisher>Vanguard Press</publisher>

<publicationDate>1946</publicationDate>

<type>book</type>

</thing>

<thing id=“20027”>

<title>Dr. Strangelove</title>

XML is great.

• we can use standardized tools

• XML is readable by both machines and humans (in theory)

• we can create rich schemas that will let us check whether an XML document is valid

XML alone is all about trees.

But sometimes trees aren’t enough.

What about all those complex relationships?

<eml:eml packageId="gss1.37.2" system="knb" xsi:schemaLocation="eml://ecoinformatics.org/eml-2.0.1 eml.xsd" scope="system"> <dataset scope="document"> <title>Test GIS data upload</title> <creator id="1170948373895" scope="document"> <individualName> <surName>steinhart</surName> </individualName> <organizationName>mann</organizationName> <positionName>librarian</positionName> </creator> <abstract> <para>test upload of a 58MB GIS data file w/eml record</para> </abstract>

<contact scope="document"> <references system="document">1170948373895</references> </contact>

one (nonstandard) way of breaking out of a tree

Let’s use one standard data model.

RDF: Resource Description Framework

GailDataset1.37.2

creator

contact

librarian

position

Mann Library

organization

graphs instead of trees

Everything is expressed as statements or triples

Subject —— Property —— Object(predicate)

Thing289435 title “Metamorphosis”Thing289435 title “Verwandlung”Thing289435 author “Kafka, Franz”Thing289435 type book

If everything’s a triple, we can store new things very easily.

Subject —— Property —— Object(predicate)

Thing289435 title “Metamorphosis”Thing289435 title “Die Verwandlung”Thing289435 author “Kafka, Franz”Thing289435 type book

Thing289435 comment “This is the one where Gregor Samsa wakes up as a cockroach.”

Thing289435 callNumber “PT2621.A25 V5 1946”

“Triple Stores”S—P—O

S—P—O

S—P—O

S—P—O

S—P—O

TRIPLE

STOREThere are various query languages for RDF, similar to SQL

We can select all the triples where the subject is Thing289435

Or select all the triples where the property is “title.”

RDF: Resource Description Framework

What’s a resource?

Something we assign a specific identifier or URI

(Uniform Resource Identifier).

http://www.somerandomlibrary.org/ourthings/Thing289435

We use this URI as the subject or object of a triple.

We can now mash this up with a whole bunch of other triples and not get confused about which thing we’re describing.

RDF: Resource Description Framework

We also assign URIs to the properties.http://purl.org/dc/elements/1.1/title

Subject http://www.somerandomlibrary.org/ourthings#Thing289435

Property http://purl.org/dc/elements/1.1/title

Object “Metamorphosis”

Now, anything that understands what a Dublin Core title is can find the title of our book.

RDF: Resource Description Framework

We even use URIs with things that aren’t resources.

Subject http://www.somerandomlibrary.org/ourthings/Thing289435

Property http://purl.org/dc/elements/1.1/title

Object “Metamorphosis”^^http://www.w3.org/2001/XMLSchema#string

“Semantics”

This is “semantic” metadata in its simplest sense.

We’ve explicitly stated what kind of relationship exists between two things.

But it’s still up to software or humans to understand what the different properties actually “mean”

Ontologies

• describe what we mean in some ways that machines can understand.

• are a standardized way of modeling the ways the different pieces of data relate to one another

• Ontologies have been around for decades, but there is an increasing interest in sharing them over the Web.

So how do we make an ontology?

• We need to decide what kinds of things we want to talk about (classes)

• We also need to describe what kind of relationships they can have (properties)

Class hierarchy(also called the taxonomy or the terminology box (“TBox”))

Thing

Person

Employee

Academic Employee Non-Academic Employee

Faculty member Librarian Cataloger Programmer

Class hierarchyArrows represent “subclass of” or “is a” relationships

Thing

Person

Employee

Academic Employee Non-Academic Employee

Faculty member Librarian Cataloger Programmer

•A faculty member is an academic employee

• A faculty member is an employee

• A faculty member is a person

• (A faculty member is a thing.)

Class hierarchyThe classes here are not disjoint.

Thing

Person

Employee

Academic Employee Non-Academic Employee

Faculty member Librarian Cataloger Programmer

We can assert that someone is a librarian.

We can assert that the same individual is also a faculty member, and that’s not a problem.

Class hierarchyLet’s make some classes disjoint.

Thing

Person

Employee

Academic Employee Non-Academic Employee

Faculty member Librarian Cataloger Programmer

Now if we try to assert that something is both a faculty member and a cow, the ontology will tell us that these statements are inconsistent with our model.

Farm Animal

Cow

disjoint

Making a class hierarchy can be trickyHow do we model an organization?

Organization charts are typically organized by what things are part of

Cornell University

CUL

LTS IRIS

CALS A&S

Plant Pathology

Crop & Soil Sciences

Asian Studies

Making a class hierarchy can be trickyThis is not a valid class hierarchy. Why not?

University

Library System

Library Department

College

College Department

Making a class hierarchy can be trickyThis is not a valid class hierarchy. Why not?

University

Library System

Library Department

College

College Department

Plant Biology

Plant Biology is a College Department.

Plant Biology is a College. (NO!)

Plant Biology is a University. (NO!)

Making a class hierarchy can be trickyLet’s try this instead.

Organization

College Library System Department

Maybe not the best model, but it works.

Siblings disjoint{ University

Let’s add a propertysubunitOf

Organization

College Library System DepartmentSiblings disjoint{ University

Plant Biology

subunitOf

Let’s add a propertysubunitOf

Now we can assert things like:

subject property object

CALS subunitOf Cornell

CUL subunitOf Cornell

Arts&Sciences subunitOf Cornell

Plant Biology subunitOf CALS

LTS subunitOf CUL

and model our organization chart.

Property hierarchiesAs with classes, properties can be arranged in a hierarchy.

partOf

subunitOf

subpropertyOf

Property HierarchiessubunitOf

Now if we assert statements like:

subject property object

CALS subunitOf Cornell

CUL subunitOf Cornell

Arts&Sciences subunitOf Cornell

Plant Biology subunitOf CALS

LTS subunitOf CUL

Our ontology tell us these statements must also be true:

subject property object

CALS partOf Cornell

CUL partOf Cornell

Arts&Sciences partOf Cornell

Plant Biology partOf CALS

LTS partOf CUL

Property hierarchiesAnother example.

memberOf

headOf

subpropertyOf

Things that are tricky to do with ontologies / statements

What if we want to express things that aren’t simple subject-predicate-object statements?

Mike took a picture of a moose with a Nikon camera in Maine.

Event-based ontologies

ABC Ontology / Harmony Project

http://metadata.net/harmony/

- events

- participants in events

- tools used in events

- outcomes of events

W3C “Technologies”

Gives us the simple standard data model that lets us draw graphs and show how things are related to one another

RDF

RDF Schema (RDFS)

Lets us construct basic ontologies and build class and property hierarchies

Web Ontology Language (OWL)

Lets us do significantly more complex things.

RDF Schema Inferencing

To make an inference is to add new statements based on existing ones.

Software that understands RDF Schema can make the kinds of simple inferences we’ve seen so far:

From:

Dr.Smith type Faculty Member

Joe Jones headOf Finance Committee

RDFS inferencing adds:

Dr.Smith type Person

Joe Jones memberOf Finance Committee

Why?

Faculty Member is a subclass of Person.

headOf is a subproperty of memberOf.

RDF Schema Limitations

Usually when we relate two things with a property, it’s very useful if the relationship is bidirectional.

David Skorton presidentOf Cornell University

implies

Cornell University hasPresident David Skorton

RDF Schema doesn’t come with a very good way of handling this.

“RoleNoun”

One way of dealing with this is to use a naming convention.

president

is president of

Software that assumes this convention can automatically the text to display for the inverse property.

The Dublin Core properties are largely compatible with this convention:

publisher

is publisher of

contributor

is contributor of (Doesn’t work!)

Inferencing

More complex inferencing with OWL usually requires a separate inference engine (also known as a reasoner or classifier).

Flavors of OWL

OWL “Tiny”

OWL Lite

OWL DL (Description Logics)

OWL Full

Inference engines get increasingly complex.

Inference engines choke. Very expressive; bad for reasoning.

OWL Basics

Object Properties

relate resources to other resources

Datatype Properties

relate resources to literals

Most software supports only string and integer datatypes.

Classes overlap by default

Must specify which classes are disjoint.

(But can’t do this if we’re using OWL Lite!)

Stuff OWL Gets Us

Explicit Inverse Properties

hasPresident

presidentOf

OWL allows us to specify that these two properties are inverses of each other.

Cornell hasPresident David Skorton

OWL inferencing automatically adds:

David Skorton presidentOf Cornell University

Stuff OWL Gets Us

Transitive Properties

partOf

If Ithaca is part of Tompkins County

and

Tompkins County is part of New York State

and

New York State is part of the United States

then Ithaca is part of the United States.

Stuff OWL Gets Us

Transitive Properties

partOf

OWL lets us specify that is property is transitive

If we assert these statements…

Ithaca partOf Tompkins County

Tompkins County partOf New York State

New York State partOf United States

Stuff OWL Gets Us

Transitive Properties

an OWL reasoner fills in these additional statements:

Ithaca partOf New York State

Ithaca partOf United States

Tompkins County partOf United States

Stuff OWL Gets Us

Transitive Properties

This time, let’s also say the partOf and hasPart are inverses of each other.

Again, we’ll assert:

Ithaca partOf New York State

Ithaca partOf United States

Tompkins County partOf United States

Stuff OWL Gets Us

Transitive Properties

Now the OWL reasoner adds these:

Ithaca partOf New York State

Ithaca partOf United States

Tompkins County partOf United States

United States hasPart New York State

United States hasPart Tompkins County

United States hasPart Ithaca

New York State hasPart Tompkins County

New York State hasPart Ithaca

Tompkins County hasPart Ithaca

Stuff OWL Gets Us

Transitive Properties

We put in three statements manually and got nine more free.

What good is this?

Makes it easier to query the data in different ways. We sacrifice some space (store more stuff) to make it faster to get the answer we want.

Stuff OWL Gets Us

Transitive Properties

Say we want to get all the towns in New York State

We could crawl around the graph, or we could ask for all the triples that match

x partOf New York State AND

x type Town

This is easier and faster. The reasoner has already done the heavy lifting for us ahead of time.

Something we can’t yet do in OWL

Transitive “over”

Who at Cornell is doing work in Africa?

Dr. Jones conductsResearchIn Lomé

Lomé partOf Togo

Togo partOf Africa

Let’s make conductsResearchIn “transitive over” partOf

Something we can’t yet do in OWL

Transitive “over”

Who at Cornell is doing work in Africa?

Because partOf is transitive, the reasoner adds

Lomé partOf Africa

Because Dr. Jones conducts research in Lomé,

Dr. Jones conductsResearchIn Africa

More OWL Constructs

Symmetric Properties

John friendOf Kate

implies

Kate friendOf John

Functional Properties

Tom birthdate 1978-03-12

More Complex OWL Inferencing

Classes

So far, we’ve only dealt with primitive classes.

We name a class and then assert that an individual is a member of that class.

But OWL lets up make defined classes where a reasoner automatically computes the membership.

More Complex OWL Inferencing

Defined Classes

We can specify what kinds of properties the members of a class need to have.

OWL Pizzas

VegetarianPizza

hasTopping allValuesFrom

(union of CheeseTopping and VegetableTopping)

CheesyPizza

hasTopping someValuesFrom

CheeseTopping

More Complex OWL Inferencing

Vet School example

Say we want to get lists of teaching faculty versus clinical faculty.

We could create TeachingFaculty and ClinicalFaculty as primitive classes,

and assert who is a member of each class.

Or, we could define:

TeachingFaculty = faculty who teach at least one course.

ClinicalFaculty = faculty who have a medical appointment in the animal hospital

We can get these properties from external databases.

Reasoner can assign membership in our defined classes based on those properties.

Inferences with Rules

Semantic Web Rules Language (SWRL)

Still in a fairly experimental stage. Not all of SWRL is compatible with OWL-DL reasoners.

OWL reasoning focuses mainly on classifying things. Rules also let us add new property instances.

If x type AcademicDepartment

and x hasFacultyMember y

and y memberOfGraduateField z

Then x hasAssociatedGraduateField z

How might more complex reasoning be useful?

• Evolution of VIVO

New directions in VIVO

VIVO “flags” that exist outside the ontology and filter entities for display.

Initially manually applied; now also automatically set. Time-consuming and error-prone.

New directions in VIVO

We should be able to use the statements we already have instead of setting flags.

How about some defined classes?

CALSUnit

departmentOrDivisionWithin

someValuesFrom CALS

CALSPerson

participantIn (employeeIn?)

someValuesFrom CALSUnit

New directions in VIVO

Different colleges have different definitions of “faculty” and “nonfaculty”

Different colleges are interested in keeping track of different things.

A multiple-ontology integration approach might be very useful:

Model how the colleges think of things, and then infer into the ontology of how VIVO wants to think of things.

Interoperability between Ontologies

OWL Constructs

equivalentClass

(equivalent class extension)

sameAs

(indivduals are the same thing, just with different names)

Interoperability between Ontologies

Subclassing an established upper-level ontology

DOLCE (Descriptive Ontology for Linguistic and Cultural Engineering)

SUMO (Suggested Upper Merged Ontology)

AKT (Advanced Knowledge Technologies)

The Overall Picture

Uses for inferencing with ontologies and rules:

Automated metadata generation

• When we enter metadata we can focus on exactly what we know and not have to try to anticipate every way someone might want to use the metadata.

• Ontology modelers and rule writers can focus on setting things up so reasoners can add new statements and let metadata be queried in different ways.

Checking accuracy of metadata/data

• Reasoners can find the inconsistent statements

• Bad metadata / data gets flagged for review.

Open Problems

Versioning

How do we track changes in ontologies over time and ensure that existing statements don’t become useless or misinterpreted?

Provenance

How do we know where statements came from? Who provided them? Were they inferred by software or asserted by someone? Can we trust the asserter?

Conflicts

What do we do when we encounter statements saying completely contradictory things?

Open Problems

Expressivity

Does OWL allow us to express enough things to be useful in real-world applications?

- OWL 1.1: University of Manchester

Software Tools

Most tools for creating, storing, and reasoning with ontologies and statements are relatively new, rapidly changing, and likely to have lots of bugs.

How can I learn more?

Protégé-OWL

http://protege.stanford.edu/

Pellet reasoner

http://pellet.owldl.com

Pizza tutorial (on Protégé site)

http://www.co-ode.org/resources/tutorials/ProtegeOWLTutorial.pdf

Thank you