1 foundations ii: ontology engineering class session 3 deborah mcguinness and joanne luciano with...
Post on 19-Dec-2015
215 views
TRANSCRIPT
1
Foundations II: Ontology Engineering
Class Session 3
Deborah McGuinness and Joanne Luciano
with Peter Fox and Li Ding
CSCI-6962-01
September 20, 2010
Review of reading Assignment• Semantic Web for the Working Ontologist
(Allemang and Hendler), first few chapters.
• Rector et al. OWL Pizzas: Practical Experience of Teaching OWL-DL: Common Errors & Common Patterns.
• Any comments, questions?
• Homework assignment due at 1200h today
2
3
Semantic Web Methodology and Technology Development Process
• Establish and improve a well-defined methodology vision for Semantic Technology based application development
• Leverage controlled vocabularies, et c.
Use Case
Small Team, mixed skills
Analysis
Adopt Technology Approach
Leverage Technology
Infrastructure
Rapid Prototype
Open World: Evolve, Iterate,
Redesign, Redeploy
Use Tools
Science/Expert Review & Iteration
Develop model/
ontology
4
Semantic Web Layers
http://www.w3.org/2003/Talks/1023-iswc-tbl/slide26-0.html, http://flickr.com/photos/pshab/291147522/
Ontology SpectrumAn ontology specifies a rich description of the• Terminology, concepts, nomenclature• Properties explicitly defining concepts• Relations among concepts (hierarchical and lattice)• Rules distinguishing concepts, refining definitions and relations
(constraints, restrictions, regular expressions)
relevant to a particular domain or area of interest.
www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-abstract.html slide from Kendall/McGuinness SemTech Tutorial
• Ontologies provide a common vocabulary for use by independently developed resources, processes, services
• Agreements among organizations sharing common services can be made with regard to their usage; the meaning of relevant concepts can be expressed unambiguously
• By composing / mapping ontologies and mediating terminology across participating events, resources and services, independently-developed services can work together to share information and processes consistently, accurately, and completely
• Ontologies also ensure– Valid conversations among agents to collect, process, fuse, and
exchange information– Accurate searching by ensuring context using concept definitions and
relations instead of/in addition to statistical relevance of keywords
Ontology-based Technologies
slide from Kendall/McGuinness SemTech Tutorial 2008
Background Knowledge• We need to provide machine understandable
encodings of terms that are used in applications
• Approaches to drive ontology creation– Bottom up (using data from databases or
scraping)– **Mid level (using use cases and knowledge of
the subject area)– Top down (using foundational or upper level
ontologies and building “down”)
Reuse existing knowledge• Standards exist in most domains; many of which are
overlapping
• Identify the set that is most relevant to the problem and business issue
• A component-based approach helps deal with overlapping standards; complex relationships can and must be defined such that term usage and overlap is unambiguous and machine interpretable
• Brainstorming with domain experts can be useful to start; then refine and iterate to the level required by the application
adapted from Kendall/McGuinness SemTech Tutorial 2009
Use Case Example• We will look at one example use case and the
thought process involved in generating the plan for the ontology encoding from the SESDI project (Semantically-Enabled Scientific Data Integration)
Selected VxyO Motivation: Mt. Spurr, AK. 8/18/1992 eruption, USGS
http://www.avo.alaska.edu/image.php?id=319
Atmosphere Use Case• Determine the statistical signatures of both
volcanic and solar forcings on the height of the tropopause From paleoclimate researcher – Caspar Ammann – Climate and Global
Dynamics Division of NCAR - CGD/NCAR
Layperson perspective:
- look for indicators of acid rain in the part of the atmosphere we experience…
(look at measurements of sulfur dioxide in relation to sulfuric acid after volcanic eruptions at the boundary of the troposphere and the stratosphere)
Nasa funded effort with Fox - NCAR, Sinha - Va. Tech, Raskin – JPL, McGuinness
Gather the Thought Process• Ask which questions are being focused on
• Ask for an answer to the questions
• Ask how the questions are answered
• Ask for criteria for a “good” answer
Use Case detail: A volcano erupts• Preferentially it’s a tropical mountain (+/- 30 degrees of the equator) with ‘acidic’ magma; more
SO2, and it erupts with great intensity so that material and large amounts of gas are injected into the stratosphere.
• The SO2 gas converts to H2SO4 (Sulfuric Acid) + H2O (75% H2SO4 + 25% H2O). The half life of SO2 is about 30 - 40 days.
• The sulfuric acid condensates to little super-cooled liquid droplets. These are the volcanic aerosol that will linger around for a year or two.
• Brewer Dobson Circulation of the stratosphere will transport aerosol to higher latitudes. The particles generate great sunsets, most commonly first seen in fall of the respective hemisphere. The sunlight gets partially reflected, some part gets scattered in the forward direction.
• Result is that the direct solar beam is reduced, yet diffuse skylight increases. The scattering is responsible for the colorful sunsets as more and more of the blue wavelength are scattered away.in mid-latitudes the volcanic aerosol starts to settle, but most efficient removal from the stratosphere is through tropopause folds in the vicinity of the storm tracks.
• If particles get over the pole, which happens in spring of the respective hemisphere, then they will settle down and fall onto polar ice caps. Its from these ice caps that we recover annual records of sulfate flux or deposit.
• We get ice cores that show continuous deposition information. Nowadays we measure sulfate or SO4(2-). Earlier measurements were indirect, putting an electric current through the ice and measuring the delay. With acids present, the electric flow would be faster.
• What we are looking for are pulse like events with a build up over a few months (mostly in summer, when the vortex is gone), and then a decay of the peak of about 1/e in 12 months.
• The distribution of these pulses was found to follow an extreme value distribution (Frechet) with a heavy tail.
Use Case detail: … climate
• So reflection reduces the total amount of energy, forward scattering just changes the beam, path length, but that's it.
• The dry fogs in the sky (even after thunderstorm) still up there, thus stratosphere not troposphere.
• The tropical reservoir will keep delivering aerosol for about two years after the eruption.
• The particles are excellent scatterers in short wavelength. They do absorb in NIR and in IR. Because of absorption, there is a local temperature change in the lower stratosphere.
• This temperature change will cause some convective motion to further spread the aerosol, and second: Its good factual stuff. Once it warms up, it will generate a temperature gradient. Horizontal temperature gradients increase the baroclinicity and thus storms, and they speedup the local zonal winds. This change in zonal wind in high latitudes is particularly large in winter. This increased zonal wind (Westerly) will remove all cold air that tries to buildup over winter in high arctic.
• Therefore, the temperature anomaly in winter time is actually quite okay.• Impact of volcanoes is to cool the surface through scattering of radiation. • In winter time over the continents there might be some warming. In the stratosphere,
the aerosol warm. • The amount of GHG emitted is comparably small to the reservoir in the air. • The hydrologic cycle responds to a volcanic eruption.
Stepping back• We have identified a number of noun phrases
and verbs that will be needed if we are to answer the questions
• Noun phrases are typically modeled as classes
• Verbs are typically modeled as properties
• Constraints are typically modeled as value (and other) restrictions
Starting Points• When building a background ontology for an
application, we need to decide whether it is best to start from scratch or to reuse other ontologies.
• Look around for existing resources
• These can be:– Existing ontologies– Database schemas– Controlled vocabularies– Table of contents like material (on a web page, in
a book, catalog, etc.
How to find starting points• Web searches for content area
• SWOOGLE
• Talk to experts
• Standards bodies (IEEE, OMG, etc.)
• In this case, SWEET – Semantic Web Earth and Environmental Terminology was a reasonable starting point – Why – because it was reasonably well used, it
included terminology we needed, it incorporated some standard terminologies we cared about
More on Scoping• Focus initially on:
– Class hierarchy– Important relationships (yielding properties and
sometimes property hierarchies)– Important restrictions (yielding classes to be used
as value restrictions)
• Acknowledge other important issues such as:– Required vs. optional (yielding cardinality
restrictions)– Disjointness– Processes
24
Developing ontologies in VSTO• Use cases and small team (7-8; 2-3 domain experts, 2
knowledge experts, 1 software engineer, 1 facilitator, 1 scribe)
• Identify classes and properties (leverage controlled vocab.)– Start with narrower terms, generalize when needed or
possible– Adopt a suitable conceptual decomposition (e.g. SWEET) – Import modules when concepts are orthogonal
• Review, vet, publish • Only code them (in RDF or OWL) when needed
(CMAP, …)• Ontologies: small and modular
25
Use Case example• Plot the neutral temperature from the Millstone-Hill
Fabry Perot, operating in the vertical mode during January 2000 as a time series.
• Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the vertical mode during January 2000 as a time series.
• Objects: – Neutral temperature is a (temperature is a) parameter– Millstone Hill is a (ground-based observatory is a) observatory– Fabry-Perot is a interferometer is a optical instrument is a instrument– Vertical mode is a instrument operating mode– January 2000 is a date-time range– Time is a independent variable/ coordinate– Time series is a data plot is a data product
26
Class and property example• Parameter
– Has coordinates (independent variables)
• Observatory– Operates instruments
• Instrument– Has operating mode
• Instrument operating mode– Has measured parameters
• Date-time interval• Data product
Modeling Advice• As we model, we want to think about how we
will represent the information.
• When we clean things up, we will want to follow best practices:– Consistent– Understandable– Extensible– Longevity (e.g., prices on wines in wine agent
may need to change frequently and may be best in a separate file)
Domain Modeling
• Next simple domain modeling and evaluation using a simplified example from the domain of wine and foods
a WINE
a LIQUIDa POTABLE
grape: chardonnay, ... [>= 1]sugar-content: dry, sweet, off-drycolor: red, white, roseprice: a PRICEwinery: a WINERY
grape dictates color (modulo skin)harvest time and sugar are related
General Categories
Structured Components
InterconnectionsBetween Parts
General Nature of Descriptions
Number / Card Restrictions
ValueRestrictions
Class
Superclass
Roles /Properties
General Nature of Descriptions
a WINE
a LIQUIDa POTABLE
grape: chardonnay, ... [>= 1]sugar-content: dry, sweet, off-drycolor: red, white, roseprice: a PRICEwinery: a WINERY
grape dictates color (modulo skin)harvest time and sugar are related
General Categories
Structured Components
InterconnectionsBetween Parts
• Define domain terms and inter-relationships– Define concepts in the domain (classes, nouns)– Identify subclass/superclass relationships– Identify attributes/properties/slots (verbs)– Identify any general properties (relations, functions, verbs) – Restrict slot values– Define individuals – Define relationships between individuals (filling in slots)
More: http://www.bell-labs.com/project/classic/papers/sowabook.ps.gz
Ontology Development
slide from Kendall/McGuinness SemTech Tutorial 2009
Classes & Class Hierarchy• A class is a concept in the domain
– Vintage – a wine made from grapes grown in a specified year– A class of properties (flavor, body, color, sugar…)
• A class is a collection of elements with similar properties
– White wine – wines made from white grapes
– White table wine – wines made from white grapes that are not appellations or regional (not “quality wine” in the EU)
• A class contains necessary conditions for membership (specific network broadcast properties, frequency, time & location)
• Instances of classes– Marietta Old Vines Red -> Red Wine – Forman Vineyards -> Winery
slide from Kendall/McGuinness SemTech Tutorial 2009
• Classes are organized into subclass-superclass (or generalization-specialization) hierarchies
• True subclass relationships are the basis of a formal is-a hierarchy
Classes are “is-a” related if an instance of the subclass is an instance of the superclass
• Classes may be viewed as sets
• Subclasses of a class are comprised of a subset of the superset
• Examples
– RedWine is a subclass of Wine
Every red wine is a wine or every instance of a red wine (e.g., Marietta Old Vines Red) is an instance of wine
– NapaValleyWine is a subclass of CaliforniaWine
Every wine from Napa Valley is a wine from California
Class Inheritance
Levels in the Class Hierarchy
• Class inheritance is Transitive– A is a subclass of B (white wine, dessert wine are
subclasses of wine)– B is a subclass of C (viognier is a subclass of
white wine, late harvest wine is a subclass of dessert wine)
– therefore A is a subclass of C (late harvest viognier is a subclass of white wine, dessert wine and wine)
Properties & Slots• Slots in a class definition describe attributes of members of
a class
each wine will have color, sugar content, flavor, body, etc.
• Types of properties– “intrinsic” properties: flavor and color of wine– “extrinsic” properties: name and price of wine– parts: ingredients in a recipe– relations to other objects: producer of wine (winery)
• Data and object properties– simple (datatype) contain primitive values (strings,
numbers)– complex properties contain other objects (e.g., a winery
instance)
Class & Slot Inheritance• A subclass inherits all the slots from its super class
If a wine has a name and flavor, a red wine also has a name and flavor
• If a class has multiple super classes, it inherits slots and restrictions from all of them
Port is both a dessert wine and a red wine. It inherits “sugar content: sweet” from the dessert wine and “color:red” from red wine
slide from Kendall/McGuinness SemTech Tutorial 2009
Property or Slot Constraints• Constraints on properties describe or limit the set of possible
values– A channel adapter in a message bus must be associated with at least one
channel– A policy applies for exactly one frequency range
• Slot cardinality – the number of values a slot can or must have– Cardinality – cardinality N means that the slot must have exactly N values– Minimum cardinality - 1 means that the slot must have a value (required), 0
means that the slot value is optional– Maximum cardinality - 1 means that the slot can have at most one value
(single-valued slot), N means that the slot can have up to N values (N > 1, multi-valued slot)
Slot Value Constraints• Slot value type – defines the set of possible
values for the property– String: a string of characters (“Château Lafite”)– Number: an integer or a float (15, 4.5)– Boolean: a true/false flag– Enumerated type: a list of allowed values (red, white,
rose)– Filler: a single value (e.g., the color slot for a
RedWine must be filled with the single value “red”)– Object type – a class defined in an ontology (e.g.,
Winery is the value restriction on the hasMaker slot on the class Wine)
slide from Kendall/McGuinness SemTech Tutorial 2009
Domain & Range Properties• In OWL and many other KR languages, relations (properties, slots)
are strictly binary
• The domain & range represent the source & target arguments, respectively, for the property
• Domain of a slot – the class (or classes) that may have the slot -Wine is the domain of the slot hasWineColor
• Range of a slot – the class (or classes) to which slot values belong - everything that fills the hasWineColor slot is an instance of the enumerated class {red, white, rose}
• Some KR languages that inherently support n-ary relations, such as CL, do not make this distinction– More flexible, intuitively more like mathematics, where functions
have ranges (or return types) but not all relations are functions– Requires additional relations to specify argument order, which
can be critical for ontology alignment
slide from Kendall/McGuinness SemTech Tutorial 2009
Property Inheritance
• A subclass inherits all the slots of its superclass(es)
• A subclass can add constraints to “narrow” the set of allowed values– Make the cardinality range smaller– Replace a class in the range with a subclass
slide from Kendall/McGuinness SemTech Tutorial 2009
Individuals or Instances of Classes
• An Individual (instance, object in other paradigms)– Any class that an individual is a member of, or is an individual
of, is a type of the individual
– Any superclass of a class is an ancestor (or type) of the individual
• Specify slot values for the individual– Slot values should conform to the constraints such as range,
value type, cardinality restrictions, etc.
slide from Kendall/McGuinness SemTech Tutorial 2009
Vehicle Example: OWL Individuals
Dupont
BoeingBMW
Daimler-Chrysler
BASF
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
OWL Statements
Dupont
Boeing
BMW
Daimler-Chrysler
BASF
a Mini Cooper S a Dakota
built
By
builtBy
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
OWL ObjectProperty
Dupont
Boeing
BMW
Daimler-Chrysler
BASF
a Mini Cooper S a Dakota
VIN
built
By
builtBy
<owl:ObjectProperty rdf:ID="builtBy"> <rdfs:range rdf:resource="#Enterprise"/> <rdfs:domain rdf:resource="#DurableGood"/> <owl:inverseOf rdf:resource="#hasBuilt"/> </owl:ObjectProperty>
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
OWL ObjectProperty
Dupont
Boeing
BMW
Daimler-Chrysler
BASF
a Mini Cooper S a Dakota
built
By
builtBy
<owl:ObjectProperty rdf:ID="builtBy"> <rdfs:range rdf:resource="#Enterprise"/> <rdfs:domain rdf:resource="#DurableGood"/> <owl:inverseOf rdf:resource="#hasBuilt"/> </owl:ObjectProperty>
range
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
OWL ObjectProperty
Dupont
Boeing
BMW
Daimler-Chrysler
BASF
a Mini Cooper S a Dakota
built
By
builtBy
<owl:ObjectProperty rdf:ID="builtBy"> <rdfs:range rdf:resource="#Enterprise"/> <rdfs:domain rdf:resource="#DurableGood"/> <owl:inverseOf rdf:resource="#hasBuilt"/> </owl:ObjectProperty>
range
domain
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
BMW
Daimler-Chrysler
a Mini Cooper S a Dakota
Inverse Properties
Dupont
Boeing
BASF
<owl:ObjectProperty rdf:ID=“hasBuilt"> <rdfs:range rdf:resource="#DurableGood"/> <rdfs:domain rdf:resource="#Enterprise"/> <owl:inverseOf rdf:resource="#builtBy"/> </owl:ObjectProperty>
domain
range
has
Built
hasB
uilt
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
Inverse Properties• Inverse slots contain redundant information, but
– Allow acquisition of the information in either direction– Enable additional verification– Allow presentation of information in both directions
• The actual implementation may vary from system to system– Are both values stored?– When are the inverse values filled in?– What happens if we change the link to an inverse slot?
• Repository models often provide support for traversing relationships (domain, domainOf; range, rangeOf), allowing where-used kinds of searches
• One of the most common uses of owl:inverseFunctionalProperty is to conceptualize relational database keys
slide from Kendall/McGuinness SemTech Tutorial 2009
• Symmetric
• Transitive
More on Properties
* Adapted from Evan Wallace, NIST
hasfriend
hasfriend
Deborah
Peter
hasPart
hasPart
hasPart
RPI
SoScience
TWC
slide from Kendall/McGuinness SemTech Tutorial 2009
Class Descriptions1. class identifier
2. enumeration
3. property restriction
4. intersection
5. union
6. complement
slide from Kendall/McGuinness SemTech Tutorial 2009
• quantified property restriction (type)– Universally quantified – allValuesFrom– Existentially quantified - someValuesFrom
• hasValue property restriction (value)• property cardinality restriction (# of values)
property P
Individualof Class C
Class Descriptions – Property Restriction
* Adapted from Evan Wallace, NIST
slide from Kendall/McGuinness SemTech Tutorial 2009
• subsumption (necessary)– A ⊆ B where B
is a class description
partial or primitive class
• definition (necessary and sufficient) – C ≡ D where D
is a class description
complete or defined class
A
B
C
D
* courtesy of Evan Wallace, NIST
Class Axioms
slide from Kendall/McGuinness SemTech Tutorial 2009
Disjoint Classes
A
B• Classes are disjoint if they cannot have common instances
• Disjoint classes cannot have any common subclasses either
• For example, if winery and wine are disjoint, then there is no instance that is both a winery and a wine. Similarly, there is no class that is both a subclass of winery and simultaneously a subclass of wine
• Disjointness is often used to aid consistency checking
• Disjointness is also helpful in teasing out subtle distinctions among classes across multiple ontologiesslide from Kendall/McGuinness SemTech Tutorial 2009
Siblings in the Class Hierarchy
• All siblings should be specified at roughly the same level of generality
• Compare to section and subsections in a book
slide from Kendall/McGuinness SemTech Tutorial 2009
Class Specification
• If a class has only one child, there may be a modeling problem – often a sign that a definition is incomplete
• If the only Red Burgundy we have is Côtes d’Or, why introduce the subclass?
Class Specification (2)
• Subclasses of a class usually have– Additional properties– Additional slot restrictions– Participate in different
relationships
• Compare to bullets in a bulleted list
slide from Kendall/McGuinness SemTech Tutorial 2009
Cyclic Definitions• Cycles are common in many KR
systems, though rarely “a good thing”
• Cycles are disallowed by some tools because they prohibit “code generation”, including RDF/OWL
• Classes A, B, and C have equivalent sets of instances– By many definitions, A, B, and
C are equivalent– Use owl:equivalentClass
instead of creating cycles
slide from Kendall/McGuinness SemTech Tutorial 2009
Creating Levels and Subclasses
• If a class has a large number of subclasses, it may be useful to define intermediate levels
• For example, in the domain of wines, there are natural groupings around wine color
• However, if no natural classification exists, the long list may be appropriate
slide from Kendall/McGuinness SemTech Tutorial 2009
• A “wine” is not a subclass of “wines”
• A particular vintage should be classified as an instance of the class Wines
• Class names should be either– all singular– all plural
• Synonym names for the same concept are not different classes
MariettaOldVinesRed
Class
Instance
instance-of
Inheritance, Naming, Synonyms
slide from Kendall/McGuinness SemTech Tutorial 2009
• Many systems, metadata standards support synonymous terms as part of a class definition
• OWL allows defining necessary and sufficiency condition definitions thereby allowing synonym definitions to be “first class” terms
MariettaOldVinesRed
Class
Instance
instance-of
Inheritance, Naming, Synonyms (2)
slide from Kendall/McGuinness SemTech Tutorial 2009
• Do concepts with different slot values become restrictions for different slots?
• How important is the distinction for the domain?
• Class definitions for most domains should be fairly stable – i.e., they should not change frequently once the definitions are established and individuals created
Class vs. Property Value
slide from Kendall/McGuinness SemTech Tutorial 2009
Class vs. Individual
• Individual instances are the most specific objects in an ontology
• If concepts form a natural hierarchy, represent them as classes• If they will have instances below them, represent them as
classes
slide from Kendall/McGuinness SemTech Tutorial 2009
Group Exercise• Domain Modeling Exercise
• Wine Agent Revisited
• VSTO revisited
• (or another topic of class choosing)
Initial Question• Questions to answer:
– Which wine goes with meal x– What is statistical signature (of x at height y)– What is neutral temperature (and how should it
be plotted)– What components should I buy in my home
theater system– What components should customer x buy in their
switching system– …..