march 2000 gio xit 1 increasing the precision of semantic interoperation gio wiederhold stanford...
Post on 22-Dec-2015
219 views
TRANSCRIPT
March 2000 Gio XIT 1
Increasing the Precision ofSemantic Interoperation
Gio Wiederhold
Stanford UniversityMarch 2000
report: www-db.stanford.edu/pub/gio/1999/miti.htm
Stanford Computer Forum
Supported by AFOSR- New World Vistas Program
Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker.
March 2000 Gio XIT 2
Heterogeneity among Domains
If interoperation involves distinct
domains mismatch ensues• Autonomy conflicts with consistency,
– Local Needs have Priority,– Outside uses are a Byproduct
Heterogeneity must be addressed• Platform and Operating Systems • Representation and Access Conventions • Naming and Ontology
March 2000 Gio XIT 3
Semantic Mismatches
Information comes from many autonomous sources• Differing viewpoints (by source)
– differing terms for similar items { lorry, truck }
– same terms for dissimilar items trunk(luggage, car)
– differing coverage vehicles (DMV, AIA)
– differing granularity trucks (shipper, manuf.)
– different scope student museum fee, Stanford
• Hinders use of information from disjoint sources – missed linkages loss of information, opportunities– irrelevant linkages overload on user or application
program
• Poor precision when merged
ok for web browsing , poor for business
March 2000 Gio XIT 4
Solutions
Specify and standardize terminology usage: ontology• Globally all interacting sources
– wonderful for users and their programs– long time to achieve, 2 sources (UAL, BA), 3 (+ trucks), 4, … all ? – costly maintenance, since all sources evolve – who has the authority to dictate conformance
• Domain-specific XML DTD assumption– Small, focused, cooperating groups– high quality, some examples - genomics, arthritis, shakespeare plays
– allows sharable, formal tools – ongoing, local maintenance affecting users - annual updates
– poor interoperation, users still face inter-domain mismatches
• solves only part of the problem
March 2000 Gio XIT 5
Domains and Consistency .
• a domain will contain many objects• the object configuration is consistent• within a domain all terms are consistent &• relationships among objects are consistent
• context is implicit
No committee is needed to forge compromises * within a domain
Compromises hide valuable details
Domain Ontology
March 2000 Gio XIT 6
Objective Scalable Knowledge Composition
Provide for Maintainable Ontologies
• devolve maintenance onto many domain-specific experts / authorities
• provide an algebra to compute composed ontologies that are limited to their articulation terms
• enable interpretation within the source contexts
SKC
March 2000 Gio XIT 7
Sample Operation: INTERSECTION
Source Domain 1:Owned and maintained by Store
Result contains shared terms,useful for purchasing
Source Domain 2:Owned and maintainedby Factory
Articulation
March 2000 Gio XIT 8
Tools to create articulations
Graph matcherforArticulation- creatingExpert
Vehicle ontology
Transport ontology
Suggestionsfor articulations
March 2000 Gio XIT 9
continue from initial pointAlso suggest similar terms for further articulation:
• by spelling similarity,• by graph position• by term match repository
Expert response:1. Okay2. False3. Irrelevant to this articulation
All results are recorded
Okay ’s are converted into articulation rules
March 2000 Gio XIT 10
Candidate Match RepositoryTerm linkages automatically extracted from 1912 Webster’s dictionary *
* free, other sources . being processed.
Based on processing headwords definitions using algebra primitives
Notice presence of 2 domains: chemistry, transport
March 2000 Gio XIT 13
An Ontology Algebra
A knowledge-based algebra for ontologies
The Articulation Ontology (AO) consists of matching rules that link domain ontologies
Intersection create a subset ontology keep sharable entries
Union create a joint ontology merge entries
Difference create a distinct ontology remove shared entries
March 2000 Gio XIT 14
INTERSECTION support
Store Ontology
Articulation ontology
Matching rules that use terms from the 2 source domains
Factory Ontology
Terms usefulfor purchasing
March 2000 Gio XIT 15
Other Basic Operations
typically priorintersections
UNION: mergingentire ontologies
DIFFERENCE: materialfully under local control
Arti-culation ontology
March 2000 Gio XIT 16
Features of an algebra
Operations can be composed
Operations can be rearranged
Alternate arrangements can be evaluated
Optimization is enabled
The record of past operations can be
kept and reused when sources change
March 2000 Gio XIT 17
Knowledge CompositionArticulationknowledgefor U
U
U
(A B)U
(B C)U
(C E)
Knowledge resource
B
Knowledge resource
A
Knowledge resource
C Knowledge
resourceD
U
(C D)
U
(B C)
Articulationknowledge
Composed knowledge forapplications using A,B,C,E
Knowledge resource
E
U
(C E)
Legend:
U : unionU: intersection
Articulationknowledgefor (A B)
U
March 2000 Gio XIT 18
Primitive Operations
Unary• Summarize -- abstract • Glossarize - list terms
• Filter - reduce instances
• Extract - move into context
Binary • Match - data corrobaration
• Difference - distance measure
• Intersect - use of articulation
• Union - search broadening
Constructors• create object• create setConnectors• match object• match setEditors• insert value• edit value• move value• delete valueConverters• object - value• object indirection• reference indirection
Model and Instance
March 2000 Gio XIT 19
Exploiting the result .
Processing & query evaluation is best performed withinSource Domains & by their engines
Result has linksto source
Avoid n2 problem of interpretermapping [Swartout HPKB year 1]
March 2000 Gio XIT 20
Domain Specialization .• Knowledge Acquisition (20% effort) &• Knowledge Maintenance (80% effort *)
to be performed• Domain specialists• Professional organizations• Field teams
of modest size
Empowermentautomouslymaintainable
* based on experience with software
March 2000 Gio XIT 21
Summary
To sustain the trend 1. The value of the results has to keep increasing
precision, relevance not volume2. Value is provided by experts,
encoded as models of diverse resources, customersProblems to be addressed mismatches quality temporal extensions maintenance
} Clear models
Thanks to Jan Jannink, Shrish Agarwal, Prasenjit Mitra, Stefan Decker.