an environment for merging and testing large ontologies deborah mcguinness, richard fikes, james...
Post on 21-Dec-2015
221 views
TRANSCRIPT
An Environment for Merging An Environment for Merging and Testing Large Ontologiesand Testing Large Ontologies
Deborah McGuinness, Richard Fikes, James Rice*, Deborah McGuinness, Richard Fikes, James Rice*, Steve Wilder Steve Wilder
Associate Director and Senior Research ScientistAssociate Director and Senior Research ScientistKnowledge Systems LaboratoryKnowledge Systems Laboratory
Stanford UniversityStanford UniversityStanford, CA 94305Stanford, CA 94305
650-723-9770650-723-9770 [email protected]
*CommerceOne, Mountain View, CA*CommerceOne, Mountain View, CA
Motivation: Ontology Integration TrendsMotivation: Ontology Integration Trends
Integrated in most search applications (Yahoo, Lycos, Xift, …)
Core component of E-Commerce applications (Amazon, eBay, Virtual Vineyards, REI, VerticalNet, CommerceOne, etc.)
Integrated in configuration applications (Dell, PROSE, etc.)
Motivation: Ontology EvolutionMotivation: Ontology Evolution
Controlled vocabularies abound (SIC-codes, UN/SPSC, RosettaNet, OpenDirectory,…)
Distributed ownership/maintenance Larger scale (Open Directory >23.5K editors,
~250K categories, 1.65M sites) Becoming more complicated - Moving to
classes and slots (and value restrictions, enumerated sets, cardinality)
Chimaera – A Merging and Chimaera – A Merging and Diagnostic Ontology EnvironmentDiagnostic Ontology Environment
Web-based tool utilizing the KSL Ontolingua platform that supports:
merging multiple ontologies found in distributed environments
analysis of single or multiple ontologies attention focus in problematic areas simple browsing and mixed initiative
editing
The Need For KB MergingThe Need For KB Merging
Large-scale knowledge repositories will contain KBs produced by multiple authors in multiple settings
KBs for applications will be built by assembling and extending multiple modular KBs from repositories
KBs developed by multiple authors will frequently Express overlapping knowledge in a common domain Use differing representations and vocabularies
For such KBs to be used together as building blocks -
Their representational differences must be reconciled
The KB Merging TaskThe KB Merging Task Combine KBs that:
Were developed independently (by multiple authors)
Express overlapping knowledge in a common domain
Use differing representations and vocabularies
Produce merged KB with
Non-redundant
Coherent
Unified
vocabulary, content, and representation
How KB Merging Tools Can HelpHow KB Merging Tools Can Help Combine input KBs with name clashes
Treat each input KB as a separate name space
Support merging of classes and relations Replace all occurrences by the merged class or relation Test for logical consistency of merge (e.g. instances/subclasses of multiple disjoint
classes) Actively look for inconsistent extensions
Match vocabulary Find name clashes, subsumed names, synonyms, ...
Focus attention Portions of KB where new relationships are likely to be needed
E.g., sibling subclasses from multiple input KBs
Derive relationships among classes and relations Disjointness, equivalence, subsumption, inconsistency, ...
Merging ToolsMerging Tools Merging can be arbitrarily difficult
KBs can differ in basic representational design May require extensive negotiation among authors
Tools can significantly accelerate major steps KB merging using conventional editing tools is
Difficult Labor intensive Error prone
Hypothesis: tools specifically designed to support KB merging can significantly Speed up the merging process Make broader user set productive Improve the quality of the resulting KB
Experiment 3: Chimæra vs. Ontolingua editor
0
20
40
60
80
100
0 400 800 1200 1600 2000 2400 2800 3200 3600
Time (s)
Cum
ulat
ive o
pera
tions Chimæra
Ontolingua Editor
Our KB Analysis TaskOur KB Analysis Task Review KBs that:
Were developed using differing standards
May be syntactically but not semantically validated
May use differing modeling representations
May have different purposes
Produce KB logs (in interactive environments) Identify provable problems
Suggest possible problems in style and/or modeling
Are extensible by being user programmable
Chimaera UsageChimaera Usage
HPKB program – analyze diverse KBs, support KR novices as well as experts
Cleaning semi-automatically generated KBs Browsing and merging multiple controlled
vocabularies (e.g., internal vocabularies and UN/SPSC (std products and services codes))
Reviewing internal vocabularies
Discussion/ConclusionDiscussion/Conclusion• Ontologies are becoming more central to applications, they are Ontologies are becoming more central to applications, they are
larger, more distributed, and longer-livedlarger, more distributed, and longer-lived• Environmental support (in particular merging and diagnostic Environmental support (in particular merging and diagnostic
support) is more critical for the broader user basesupport) is more critical for the broader user base
• Chimaera provides merging and diagnostic support for Chimaera provides merging and diagnostic support for ontologies in many formatsontologies in many formats
• It improves performance over existing toolsIt improves performance over existing tools• It has been used by people of various training backgrounds in It has been used by people of various training backgrounds in
government and commercial applications and is available for government and commercial applications and is available for use.use.
• http://www.ksl.Stanford.EDU/software/chimaera/ -movie, tutorial, papers, link to live system, etc.
ExtrasExtras
The Need For KB AnalysisThe Need For KB Analysis Large-scale knowledge repositories will contain KBs produced by
multiple authors in multiple settings KBs for applications will be built by assembling and extending
multiple modular KBs from repositories that may not be consistent KBs developed by multiple authors will frequently
Express overlapping knowledge in different, possibly contradictory ways Use differing assumptions and styles Have different purposes
KBs must be reviewed for appropriateness and “correctness”
What is an Ontology?What is an Ontology?
Catalog/ID
GeneralLogical
constraints
Terms/glossary
Thesauri“narrower
term”relation
Formalis-a
Frames(properties)
Informalis-a
Formalinstance
Value Restrs.
Disjointness, Inverse, part-
of…
Ontologies and importance to Ontologies and importance to E-CommerceE-Commerce
Simple ontologies provide: Controlled shared vocabulary (search engines, authors,
users, databases, programs all speak same language) Organization (and navigation support) Expectation setting (left side of many web pages) Browsing support (tagged structures such as Yahoo!) Search support (query expansion approaches such as
FindUR, e-Cyc) Sense disambiguation
Ontologies and importance to Ontologies and importance to E-Commerce IIE-Commerce II
Foundation for expansion and leverage Conflict detection Completion Regression testing/validation/verification support
foundation Configuration support Structured, comparative search Generalization/ Specialization …
E-Commerce Search E-Commerce Search (starting point Forrester modified by McGuinness)(starting point Forrester modified by McGuinness)
Ask Queries - multiple search interfaces (surgical shoppers, advice seekers, window shoppers) - set user expectations (interactive query refinement) - anticipate anomalies Get Answers - basic information (multiple sorts, filtering, structuring) - modify results (user defined parameters for refining, user profile info, narrow
query, broaden query, disambiguate query) - suggest alternatives (suggest other comparable products even from competitor’s
sites) Make Decisions - manipulate results (enable side by side comparison) - dive deeper (provide additional info, multimedia, other views) - take action (buy)
A Few Observations about OntologiesA Few Observations about Ontologies Simple ontologies can be built by non-experts
Consider Verity’s Topic Editor, Collaborative Topic Builder, GFP interface, Chimaera, etc. Ontologies can be semi-automatically generated
from crawls of site such as yahoo!, amazon, excite, etc. Semi-structured sites can provide starting points
Ontologies are exploding (business pull instead of technology push) most e-commerce sites are using them - MySimon, Affinia, Amazon, Yahoo! Shopping,,
etc. Controlled vocabularies (for the web) abound - SIC codes, UMLS, UN/SPSC, Open
Directory, Rosetta Net, … Business ontologies are including roles DTDs are making more ontology information available Businesses have ontology directors “Real” ontologies are becoming more central to applications
Implications and NeedsImplications and Needs Ontology Language Syntax and Semantics Environments for Creation and Maintenance of Ontologies Training (Conceptual Modeling, reasoning implications,
…) Issues:
Collaboration among distributed teams Diverse training levels Interconnectivity with many systems/standards Analysis and Diagnosis Scale
Experiment 3: Maximum edits performed vs. time
0
10
20
30
40
50
60
70
80
90
100
0 500 1000 1500 2000 2500 3000 3500
Time(s)