1 foundations ii: ontology engineering class session 3 deborah mcguinness and joanne luciano with...

65
1 Foundations II: Ontology Engineering Class Session 3 Deborah McGuinness and Joanne Luciano with Peter Fox and Li Ding CSCI-6962-01 September 20, 2010

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Foundations II: Ontology Engineering

Class Session 3

Deborah McGuinness and Joanne Luciano

with Peter Fox and Li Ding

CSCI-6962-01

September 20, 2010

Review of reading Assignment• Semantic Web for the Working Ontologist

(Allemang and Hendler), first few chapters.

• Rector et al. OWL Pizzas: Practical Experience of Teaching OWL-DL: Common Errors & Common Patterns.

• Any comments, questions?

• Homework assignment due at 1200h today

2

3

Semantic Web Methodology and Technology Development Process

• Establish and improve a well-defined methodology vision for Semantic Technology based application development

• Leverage controlled vocabularies, et c.

Use Case

Small Team, mixed skills

Analysis

Adopt Technology Approach

Leverage Technology

Infrastructure

Rapid Prototype

Open World: Evolve, Iterate,

Redesign, Redeploy

Use Tools

Science/Expert Review & Iteration

Develop model/

ontology

4

Semantic Web Layers

http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/

Ontology SpectrumAn ontology specifies a rich description of the• Terminology, concepts, nomenclature• Properties explicitly defining concepts• Relations among concepts (hierarchical and lattice)• Rules distinguishing concepts, refining definitions and relations

(constraints, restrictions, regular expressions)

relevant to a particular domain or area of interest.

www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html slide from Kendall/McGuinness SemTech Tutorial

• Ontologies provide a common vocabulary for use by independently developed resources, processes, services

• Agreements among organizations sharing common services can be made with regard to their usage; the meaning of relevant concepts can be expressed unambiguously

• By composing / mapping ontologies and mediating terminology across participating events, resources and services, independently-developed services can work together to share information and processes consistently, accurately, and completely

• Ontologies also ensure– Valid conversations among agents to collect, process, fuse, and

exchange information– Accurate searching by ensuring context using concept definitions and

relations instead of/in addition to statistical relevance of keywords

Ontology-based Technologies

slide from Kendall/McGuinness SemTech Tutorial 2008

Background Knowledge• We need to provide machine understandable

encodings of terms that are used in applications

• Approaches to drive ontology creation– Bottom up (using data from databases or

scraping)– **Mid level (using use cases and knowledge of

the subject area)– Top down (using foundational or upper level

ontologies and building “down”)

Reuse existing knowledge• Standards exist in most domains; many of which are

overlapping

• Identify the set that is most relevant to the problem and business issue

• A component-based approach helps deal with overlapping standards; complex relationships can and must be defined such that term usage and overlap is unambiguous and machine interpretable

• Brainstorming with domain experts can be useful to start; then refine and iterate to the level required by the application

adapted from Kendall/McGuinness SemTech Tutorial 2009

Use Case Example• We will look at one example use case and the

thought process involved in generating the plan for the ontology encoding from the SESDI project (Semantically-Enabled Scientific Data Integration)

Selected VxyO Motivation: Mt. Spurr, AK. 8/18/1992 eruption, USGS

http://www.avo.alaska.edu/image.php?id=319

Eruption cloud movement from Mt.Spurr, AK,1992

USGS

Tropopause

http://aerosols.larc.nasa.gov/volcano2.swf

Atmosphere Use Case• Determine the statistical signatures of both

volcanic and solar forcings on the height of the tropopause From paleoclimate researcher – Caspar Ammann – Climate and Global

Dynamics Division of NCAR - CGD/NCAR

Layperson perspective:

- look for indicators of acid rain in the part of the atmosphere we experience…

(look at measurements of sulfur dioxide in relation to sulfuric acid after volcanic eruptions at the boundary of the troposphere and the stratosphere)

Nasa funded effort with Fox - NCAR, Sinha - Va. Tech, Raskin – JPL, McGuinness

Gather the Thought Process• Ask which questions are being focused on

• Ask for an answer to the questions

• Ask how the questions are answered

• Ask for criteria for a “good” answer

Use Case detail: A volcano erupts• Preferentially it’s a tropical mountain (+/- 30 degrees of the equator) with ‘acidic’ magma; more

SO2, and it erupts with great intensity so that material and large amounts of gas are injected into the stratosphere.

• The SO2 gas converts to H2SO4 (Sulfuric Acid) + H2O (75% H2SO4 + 25% H2O). The half life of SO2 is about 30 - 40 days.

• The sulfuric acid condensates to little super-cooled liquid droplets. These are the volcanic aerosol that will linger around for a year or two.

• Brewer Dobson Circulation of the stratosphere will transport aerosol to higher latitudes. The particles generate great sunsets, most commonly first seen in fall of the respective hemisphere. The sunlight gets partially reflected, some part gets scattered in the forward direction.

• Result is that the direct solar beam is reduced, yet diffuse skylight increases. The scattering is responsible for the colorful sunsets as more and more of the blue wavelength are scattered away.in mid-latitudes the volcanic aerosol starts to settle, but most efficient removal from the stratosphere is through tropopause folds in the vicinity of the storm tracks.

• If particles get over the pole, which happens in spring of the respective hemisphere, then they will settle down and fall onto polar ice caps. Its from these ice caps that we recover annual records of sulfate flux or deposit.

• We get ice cores that show continuous deposition information. Nowadays we measure sulfate or SO4(2-). Earlier measurements were indirect, putting an electric current through the ice and measuring the delay. With acids present, the electric flow would be faster.

• What we are looking for are pulse like events with a build up over a few months (mostly in summer, when the vortex is gone), and then a decay of the peak of about 1/e in 12 months.

• The distribution of these pulses was found to follow an extreme value distribution (Frechet) with a heavy tail.

Use Case detail: … climate

• So reflection reduces the total amount of energy, forward scattering just changes the beam, path length, but that's it.

• The dry fogs in the sky (even after thunderstorm) still up there, thus stratosphere not troposphere.

• The tropical reservoir will keep delivering aerosol for about two years after the eruption.

• The particles are excellent scatterers in short wavelength. They do absorb in NIR and in IR. Because of absorption, there is a local temperature change in the lower stratosphere.

• This temperature change will cause some convective motion to further spread the aerosol, and second: Its good factual stuff. Once it warms up, it will generate a temperature gradient. Horizontal temperature gradients increase the baroclinicity and thus storms, and they speedup the local zonal winds. This change in zonal wind in high latitudes is particularly large in winter. This increased zonal wind (Westerly) will remove all cold air that tries to buildup over winter in high arctic.

• Therefore, the temperature anomaly in winter time is actually quite okay.• Impact of volcanoes is to cool the surface through scattering of radiation. • In winter time over the continents there might be some warming. In the stratosphere,

the aerosol warm. • The amount of GHG emitted is comparably small to the reservoir in the air. • The hydrologic cycle responds to a volcanic eruption.

Stepping back• We have identified a number of noun phrases

and verbs that will be needed if we are to answer the questions

• Noun phrases are typically modeled as classes

• Verbs are typically modeled as properties

• Constraints are typically modeled as value (and other) restrictions

Starting Points• When building a background ontology for an

application, we need to decide whether it is best to start from scratch or to reuse other ontologies.

• Look around for existing resources

• These can be:– Existing ontologies– Database schemas– Controlled vocabularies– Table of contents like material (on a web page, in

a book, catalog, etc.

How to find starting points• Web searches for content area

• SWOOGLE

• Talk to experts

• Standards bodies (IEEE, OMG, etc.)

• In this case, SWEET – Semantic Web Earth and Environmental Terminology was a reasonable starting point – Why – because it was reasonably well used, it

included terminology we needed, it incorporated some standard terminologies we cared about

Atmosphere (portions from SWEET)

Atmosphere II

More on Scoping• Focus initially on:

– Class hierarchy– Important relationships (yielding properties and

sometimes property hierarchies)– Important restrictions (yielding classes to be used

as value restrictions)

• Acknowledge other important issues such as:– Required vs. optional (yielding cardinality

restrictions)– Disjointness– Processes

23

Representing processes

24

Developing ontologies in VSTO• Use cases and small team (7-8; 2-3 domain experts, 2

knowledge experts, 1 software engineer, 1 facilitator, 1 scribe)

• Identify classes and properties (leverage controlled vocab.)– Start with narrower terms, generalize when needed or

possible– Adopt a suitable conceptual decomposition (e.g. SWEET) – Import modules when concepts are orthogonal

• Review, vet, publish • Only code them (in RDF or OWL) when needed

(CMAP, …)• Ontologies: small and modular

25

Use Case example• Plot the neutral temperature from the Millstone-Hill

Fabry Perot, operating in the vertical mode during January 2000 as a time series.

• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the vertical mode during January 2000 as a time series.

• Objects: – Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product

26

Class and property example• Parameter

– Has coordinates (independent variables)

• Observatory– Operates instruments

• Instrument– Has operating mode

• Instrument operating mode– Has measured parameters

• Date-time interval• Data product

Modeling Advice• As we model, we want to think about how we

will represent the information.

• When we clean things up, we will want to follow best practices:– Consistent– Understandable– Extensible– Longevity (e.g., prices on wines in wine agent

may need to change frequently and may be best in a separate file)

Domain Modeling

• Next simple domain modeling and evaluation using a simplified example from the domain of wine and foods

a WINE

a LIQUIDa POTABLE

grape: chardonnay, ... [>= 1]sugar-content: dry, sweet, off-drycolor: red, white, roseprice: a PRICEwinery: a WINERY

grape dictates color (modulo skin)harvest time and sugar are related

General Categories

Structured Components

InterconnectionsBetween Parts

General Nature of Descriptions

Number / Card Restrictions

ValueRestrictions

Class

Superclass

Roles /Properties

General Nature of Descriptions

a WINE

a LIQUIDa POTABLE

grape: chardonnay, ... [>= 1]sugar-content: dry, sweet, off-drycolor: red, white, roseprice: a PRICEwinery: a WINERY

grape dictates color (modulo skin)harvest time and sugar are related

General Categories

Structured Components

InterconnectionsBetween Parts

• Define domain terms and inter-relationships– Define concepts in the domain (classes, nouns)– Identify subclass/superclass relationships– Identify attributes/properties/slots (verbs)– Identify any general properties (relations, functions, verbs) – Restrict slot values– Define individuals – Define relationships between individuals (filling in slots)

More: http://www.bell-labs.com/project/classic/papers/sowabook.ps.gz

Ontology Development

slide from Kendall/McGuinness SemTech Tutorial 2009

Classes & Class Hierarchy• A class is a concept in the domain

– Vintage – a wine made from grapes grown in a specified year– A class of properties (flavor, body, color, sugar…)

• A class is a collection of elements with similar properties

– White wine – wines made from white grapes

– White table wine – wines made from white grapes that are not appellations or regional (not “quality wine” in the EU)

• A class contains necessary conditions for membership (specific network broadcast properties, frequency, time & location)

• Instances of classes– Marietta Old Vines Red -> Red Wine – Forman Vineyards -> Winery

slide from Kendall/McGuinness SemTech Tutorial 2009

• Classes are organized into subclass-superclass (or generalization-specialization) hierarchies

• True subclass relationships are the basis of a formal is-a hierarchy

Classes are “is-a” related if an instance of the subclass is an instance of the superclass

• Classes may be viewed as sets

• Subclasses of a class are comprised of a subset of the superset

• Examples

– RedWine is a subclass of Wine

Every red wine is a wine or every instance of a red wine (e.g., Marietta Old Vines Red) is an instance of wine

– NapaValleyWine is a subclass of CaliforniaWine

Every wine from Napa Valley is a wine from California

Class Inheritance

Levels in the Class Hierarchy

• Class inheritance is Transitive– A is a subclass of B (white wine, dessert wine are

subclasses of wine)– B is a subclass of C (viognier is a subclass of

white wine, late harvest wine is a subclass of dessert wine)

– therefore A is a subclass of C (late harvest viognier is a subclass of white wine, dessert wine and wine)

Properties & Slots• Slots in a class definition describe attributes of members of

a class

each wine will have color, sugar content, flavor, body, etc.

• Types of properties– “intrinsic” properties: flavor and color of wine– “extrinsic” properties: name and price of wine– parts: ingredients in a recipe– relations to other objects: producer of wine (winery)

• Data and object properties– simple (datatype) contain primitive values (strings,

numbers)– complex properties contain other objects (e.g., a winery

instance)

Class & Slot Inheritance• A subclass inherits all the slots from its super class

If a wine has a name and flavor, a red wine also has a name and flavor

• If a class has multiple super classes, it inherits slots and restrictions from all of them

Port is both a dessert wine and a red wine. It inherits “sugar content: sweet” from the dessert wine and “color:red” from red wine

slide from Kendall/McGuinness SemTech Tutorial 2009

Property or Slot Constraints• Constraints on properties describe or limit the set of possible

values– A channel adapter in a message bus must be associated with at least one

channel– A policy applies for exactly one frequency range

• Slot cardinality – the number of values a slot can or must have– Cardinality – cardinality N means that the slot must have exactly N values– Minimum cardinality - 1 means that the slot must have a value (required), 0

means that the slot value is optional– Maximum cardinality - 1 means that the slot can have at most one value

(single-valued slot), N means that the slot can have up to N values (N > 1, multi-valued slot)

Slot Value Constraints• Slot value type – defines the set of possible

values for the property– String: a string of characters (“Château Lafite”)– Number: an integer or a float (15, 4.5)– Boolean: a true/false flag– Enumerated type: a list of allowed values (red, white,

rose)– Filler: a single value (e.g., the color slot for a

RedWine must be filled with the single value “red”)– Object type – a class defined in an ontology (e.g.,

Winery is the value restriction on the hasMaker slot on the class Wine)

slide from Kendall/McGuinness SemTech Tutorial 2009

Domain & Range Properties• In OWL and many other KR languages, relations (properties, slots)

are strictly binary

• The domain & range represent the source & target arguments, respectively, for the property

• Domain of a slot – the class (or classes) that may have the slot -Wine is the domain of the slot hasWineColor

• Range of a slot – the class (or classes) to which slot values belong - everything that fills the hasWineColor slot is an instance of the enumerated class {red, white, rose}

• Some KR languages that inherently support n-ary relations, such as CL, do not make this distinction– More flexible, intuitively more like mathematics, where functions

have ranges (or return types) but not all relations are functions– Requires additional relations to specify argument order, which

can be critical for ontology alignment

slide from Kendall/McGuinness SemTech Tutorial 2009

Property Inheritance

• A subclass inherits all the slots of its superclass(es)

• A subclass can add constraints to “narrow” the set of allowed values– Make the cardinality range smaller– Replace a class in the range with a subclass

slide from Kendall/McGuinness SemTech Tutorial 2009

Individuals or Instances of Classes

• An Individual (instance, object in other paradigms)– Any class that an individual is a member of, or is an individual

of, is a type of the individual

– Any superclass of a class is an ancestor (or type) of the individual

• Specify slot values for the individual– Slot values should conform to the constraints such as range,

value type, cardinality restrictions, etc.

slide from Kendall/McGuinness SemTech Tutorial 2009

Vehicle Example: OWL Individuals

Dupont

BoeingBMW

Daimler-Chrysler

BASF

* Adapted from Evan Wallace, NIST

slide from Kendall/McGuinness SemTech Tutorial 2009

OWL Statements

Dupont

Boeing

BMW

Daimler-Chrysler

BASF

a Mini Cooper S a Dakota

built

By

builtBy

* Adapted from Evan Wallace, NIST

slide from Kendall/McGuinness SemTech Tutorial 2009

OWL ObjectProperty

Dupont

Boeing

BMW

Daimler-Chrysler

BASF

a Mini Cooper S a Dakota

VIN

built

By

builtBy

<owl:ObjectProperty rdf:ID="builtBy"> <rdfs:range rdf:resource="#Enterprise"/> <rdfs:domain rdf:resource="#DurableGood"/> <owl:inverseOf rdf:resource="#hasBuilt"/> </owl:ObjectProperty>

* Adapted from Evan Wallace, NIST

slide from Kendall/McGuinness SemTech Tutorial 2009

OWL ObjectProperty

Dupont

Boeing

BMW

Daimler-Chrysler

BASF

a Mini Cooper S a Dakota

built

By

builtBy

<owl:ObjectProperty rdf:ID="builtBy"> <rdfs:range rdf:resource="#Enterprise"/> <rdfs:domain rdf:resource="#DurableGood"/> <owl:inverseOf rdf:resource="#hasBuilt"/> </owl:ObjectProperty>

range

* Adapted from Evan Wallace, NIST

slide from Kendall/McGuinness SemTech Tutorial 2009

OWL ObjectProperty

Dupont

Boeing

BMW

Daimler-Chrysler

BASF

a Mini Cooper S a Dakota

built

By

builtBy

<owl:ObjectProperty rdf:ID="builtBy"> <rdfs:range rdf:resource="#Enterprise"/> <rdfs:domain rdf:resource="#DurableGood"/> <owl:inverseOf rdf:resource="#hasBuilt"/> </owl:ObjectProperty>

range

domain

* Adapted from Evan Wallace, NIST

slide from Kendall/McGuinness SemTech Tutorial 2009

BMW

Daimler-Chrysler

a Mini Cooper S a Dakota

Inverse Properties

Dupont

Boeing

BASF

<owl:ObjectProperty rdf:ID=“hasBuilt"> <rdfs:range rdf:resource="#DurableGood"/> <rdfs:domain rdf:resource="#Enterprise"/> <owl:inverseOf rdf:resource="#builtBy"/> </owl:ObjectProperty>

domain

range

has

Built

hasB

uilt

* Adapted from Evan Wallace, NIST

slide from Kendall/McGuinness SemTech Tutorial 2009

Inverse Properties• Inverse slots contain redundant information, but

– Allow acquisition of the information in either direction– Enable additional verification– Allow presentation of information in both directions

• The actual implementation may vary from system to system– Are both values stored?– When are the inverse values filled in?– What happens if we change the link to an inverse slot?

• Repository models often provide support for traversing relationships (domain, domainOf; range, rangeOf), allowing where-used kinds of searches

• One of the most common uses of owl:inverseFunctionalProperty is to conceptualize relational database keys

slide from Kendall/McGuinness SemTech Tutorial 2009

• Symmetric

• Transitive

More on Properties

* Adapted from Evan Wallace, NIST

hasfriend

hasfriend

Deborah

Peter

hasPart

hasPart

hasPart

RPI

SoScience

TWC

slide from Kendall/McGuinness SemTech Tutorial 2009

Class Descriptions1. class identifier

2. enumeration

3. property restriction

4. intersection

5. union

6. complement

slide from Kendall/McGuinness SemTech Tutorial 2009

• quantified property restriction (type)– Universally quantified – allValuesFrom– Existentially quantified - someValuesFrom

• hasValue property restriction (value)• property cardinality restriction (# of values)

property P

Individualof Class C

Class Descriptions – Property Restriction

* Adapted from Evan Wallace, NIST

slide from Kendall/McGuinness SemTech Tutorial 2009

• subsumption (necessary)– A ⊆ B where B

is a class description

partial or primitive class

• definition (necessary and sufficient) – C ≡ D where D

is a class description

complete or defined class

A

B

C

D

* courtesy of Evan Wallace, NIST

Class Axioms

slide from Kendall/McGuinness SemTech Tutorial 2009

Disjoint Classes

A

B• Classes are disjoint if they cannot have common instances

• Disjoint classes cannot have any common subclasses either

• For example, if winery and wine are disjoint, then there is no instance that is both a winery and a wine. Similarly, there is no class that is both a subclass of winery and simultaneously a subclass of wine

• Disjointness is often used to aid consistency checking

• Disjointness is also helpful in teasing out subtle distinctions among classes across multiple ontologiesslide from Kendall/McGuinness SemTech Tutorial 2009

Siblings in the Class Hierarchy

• All siblings should be specified at roughly the same level of generality

• Compare to section and subsections in a book

slide from Kendall/McGuinness SemTech Tutorial 2009

Class Specification

• If a class has only one child, there may be a modeling problem – often a sign that a definition is incomplete

• If the only Red Burgundy we have is Côtes d’Or, why introduce the subclass?

Class Specification (2)

• Subclasses of a class usually have– Additional properties– Additional slot restrictions– Participate in different

relationships

• Compare to bullets in a bulleted list

slide from Kendall/McGuinness SemTech Tutorial 2009

Cyclic Definitions• Cycles are common in many KR

systems, though rarely “a good thing”

• Cycles are disallowed by some tools because they prohibit “code generation”, including RDF/OWL

• Classes A, B, and C have equivalent sets of instances– By many definitions, A, B, and

C are equivalent– Use owl:equivalentClass

instead of creating cycles

slide from Kendall/McGuinness SemTech Tutorial 2009

Creating Levels and Subclasses

• If a class has a large number of subclasses, it may be useful to define intermediate levels

• For example, in the domain of wines, there are natural groupings around wine color

• However, if no natural classification exists, the long list may be appropriate

slide from Kendall/McGuinness SemTech Tutorial 2009

• A “wine” is not a subclass of “wines”

• A particular vintage should be classified as an instance of the class Wines

• Class names should be either– all singular– all plural

• Synonym names for the same concept are not different classes

MariettaOldVinesRed

Class

Instance

instance-of

Inheritance, Naming, Synonyms

slide from Kendall/McGuinness SemTech Tutorial 2009

• Many systems, metadata standards support synonymous terms as part of a class definition

• OWL allows defining necessary and sufficiency condition definitions thereby allowing synonym definitions to be “first class” terms

MariettaOldVinesRed

Class

Instance

instance-of

Inheritance, Naming, Synonyms (2)

slide from Kendall/McGuinness SemTech Tutorial 2009

• Do concepts with different slot values become restrictions for different slots?

• How important is the distinction for the domain?

• Class definitions for most domains should be fairly stable – i.e., they should not change frequently once the definitions are established and individuals created

Class vs. Property Value

slide from Kendall/McGuinness SemTech Tutorial 2009

Class vs. Individual

• Individual instances are the most specific objects in an ontology

• If concepts form a natural hierarchy, represent them as classes• If they will have instances below them, represent them as

classes

slide from Kendall/McGuinness SemTech Tutorial 2009

Group Exercise• Domain Modeling Exercise

• Wine Agent Revisited

• VSTO revisited

• (or another topic of class choosing)

Initial Question• Questions to answer:

– Which wine goes with meal x– What is statistical signature (of x at height y)– What is neutral temperature (and how should it

be plotted)– What components should I buy in my home

theater system– What components should customer x buy in their

switching system– …..

Logistics• Hand in assignment by 6 pm if you have not

already done so

• Reading assignment for next week – reading on use cases. (note that there are

mandatory and optional readings this week)

• Next week we will do an in-class group exercise on use cases and…