{ontology: resource} x {matching : mapping} x {schema : instance} :: components of the same...
DESCRIPTION
Invited Talk, International Workshop on Ontology Matchingcollocated with the 5th International Semantic Web Conference ISWC-2006, November 5, 2006, Athens GATRANSCRIPT
{Ontology: Resource} x {Matching : Mapping} x {Schema : Instance} ::
Components of the same challenge?
Invited Talk, International Workshop on Ontology Matchingcollocated with the 5th International Semantic Web Conference
ISWC-2006, November 5, 2006, Athens GA
Professor Amit Sheth
Special Thanks: Meena NagarajanAcknowledgment: SemDis project, funded by NSF
Semantic Web, some DL-II projects,Semantic Web, some DL-II projects,Semagix SCORE, Applied SemanticsSemagix SCORE, Applied Semantics
VideoAnywhereVideoAnywhereInfoQuiltInfoQuilt
OBSERVEROBSERVER
Generation IIIGeneration III(information
brokering)
1997...1997...
Semantics (Ontology, Context, Relationships, KB)
InfoSleuth, KMed, DL-I projectsInfoSleuth, KMed, DL-I projectsInfoscopes, HERMES, SIMS, Infoscopes, HERMES, SIMS,
Garlic,TSIMMIS,Harvest, RUFUS,...Garlic,TSIMMIS,Harvest, RUFUS,...
Generation IIGeneration II(mediators)
1990s1990s
VisualHarnessVisualHarnessInfoHarnessInfoHarness
Metadata (Domain model)
MermaidMermaidDDTSDDTS
Multibase, MRDSM, ADDS, Multibase, MRDSM, ADDS, IISS, Omnibase, ...IISS, Omnibase, ...
Generation IGeneration I(federated DB/
multidatabases)
1980s1980s
Data (Schema, “semantic data modeling)
Information System needs and Ontology Matching goals
SemDis, ISISSemDis, ISIS
Information systems - From mediators to information brokering• Mediators
between heterogeneous information sources– InfoHarness,
VisualHarness, InfoSleuth, SIMS, Garlic etc.
IH Server
Raw Data
IH Clients
ImageText Video
AudioVisualHarness Architecture
End UserWebBrowsers
End UserWebBrowsers
End UserWebBrowsers
Internet
Information Resources
Metadata Database (Metabase)(Oracle)
Repository 1
Repository m
.....
IH administrative tools
Circa 1992-1996.
INFORMATION CONSUMERSINFORMATION CONSUMERS
INFORMATION PROVIDERSINFORMATION PROVIDERS
CorporationsUniversities
People
GovernmentPrograms
User Query
UserQuery
UserQuery
InformationSystem
DataRepository
InformationSystem
NewswiresUniversities
CorporationsResearch Labs
INFORMATION BROKERINGINFORMATION BROKERING
Domain SpecificOntologies
Information systems - From mediators to information brokers• Information
brokers– InfoQuilt,
OBSERVER etc.
Circa 1996-2000
Need for querying across multiple ontologies
IRM
InterontologiesRelationships
...Repositories
Mappings/Ontology Server
QueryProcessor
...Repositories
Mappings/Ontology Server
Query Processor
...
...Mappings/Ontology Server
Query Processor
UserQuery
Ontologies
OntologiesOntologies
OBSERVER
Circa 1994, 1996-2002
Ontology Matching – goals
• Goals of ontology matching (and mapping, or integration) – Shallow analysis to identify dependencies for
integration– Deeper analysis to create mappings for query
based transformations / integration– Integrate schemas to create a global schema– Integrate instance bases
Sheth, Review of a real world experience in database schema integration (Bellcore, ca. 1993)
Ontology Matching – changing notions• Given the distributed nature of modeling domains
and metadata, the need for matching advanced to Information Integration
• Now– Query processing not limited to multiple databases or
ontologies, but multiple domains and sources of information
– Exploiting structured, semi-structured and unstructured data sources, multi-model Web sources
The process of Ontology Matching• Different for purposes of merging / aligning
ontologies – Type of relationships that suffice to be discovered are
limited to equivalence / inclusion / disjointness / overlap mappings
• Different for purposes of information integration to analytics to discovery– Need for discovering more Complex mappings
• Named relationships / associations• Graph based / numerical mappings
Top down and bottom up view to ontology matching• Top Down: schema + instance integration
to provide information integration
Ontologies
Heterogeneous data
Today, the Food and Drug Administration (FDA) is announcing that it has asked Pfizer, Inc. to voluntarily withdraw Bextra from themarket. Pfizer has agreed to suspend sales and marketing of Bextra in the , pending further discussions with the agency.
Semantic metadata
Horizontal Semantic Integration
Vertical S
emantic Integration
IntegrationOntology
ComplexMapping
Relationship
• Top Down: schema + instance integration to provide information integration
Top down and bottom up view to ontology matching• Bottom up: exploit external data sources
to drive schema matching
Ontologies
Heterogeneous data
Today, the Food and Drug Administration (FDA) is announcing that it has asked Pfizer, Inc. to voluntarily withdraw Bextra from themarket. Pfizer has agreed to suspend sales and marketing of Bextra in the , pending further discussions with the agency.
Semantic metadata
Horizontal Semantic Integration
Vertical S
emantic Integration
IntegrationOntology
ComplexMapping
Relationship
A step backDB vs. Ontology - Fundamental
differences
Schema integration goals – DB vs. Ontology• DB schema integration goal
– “Defining an integrated view of the data for all applications using the data.”
• Ontology schema integration goal– “Defining an agreement between multiple
ontology schemas modeled for the same domain.”
Goals are different because of differences in:• The modeling paradigms
– A database schema is a model for the data that one more applications intend to use.
– An ontology is a model of knowledge for a bounded region of interest (also known as a domain)
• Data vs. Knowledge : A DB instance base is not the same as an ontology instance base– A database models data to be used by one or more
applications– An ontology models knowledge about a domain,
independent of the application
Modeling Database vs. Ontology schemas -
Fundamental differencesAxis of
comparisonDatabase schemas
Ontology schemas
Modeling perspective
Intended to model data being used by one or more applications
Intended to model a domain
Structure vs. Semantics
Emphasis while modeling is on structure of the tables
Emphasis while modeling is on the semantics of the domain – emphasis on relationships, also facts/knowledge/ground truth
Agreement Limited to a syntactic agreement between applications using the data
Symbolizes agreement of the modeling of a domain possibly used by applications in varying contexts.
Instance metadata modeling /
expressiveness
Limited expressivity in capturing instance level metadata due to static schemas
More expressive modeling paradigm
Context of modeling
Well defined by applications using the data
Modeling of a domain irrespective of applications
Choice of modeling affects the possible space of heterogeneities and
therefore the process of matching.
In both cases however, the schema is only an abstraction of the real world;
the real power/semantics lies at the instance level.
The space of heterogeneities in DB schema integration• Conflicts/Heterogeneities in DB schema
integration– Model / representation : relational vs. network vs.
hierarchical models– Structural / schematic :
• Domain Incompatibilities• Entity Definition Incompatibilities• Data Value Incompatibilities• Abstraction level Incompatibilities
• Largely syntactic and structural; relatively few semantic conflicts
Sheth/Kashyap 1992, Kim/Seo 1993, Kashyap/Sheth 1996)
• Conflicts/Heterogeneities in ontology schema integration – Significant conflicts in perception of a domain – semantic
conflicts– Other heterogeneities are similar to those in the DB
world• Model / representation : OWL/RDF ; topic maps etc.• Structural : modeling as an entity vs. an attribute/property;
generalization vs. abstraction etc.
• Largely semantic conflicts; comparable syntactic conflicts
The space of heterogeneities in ontology schema integration
Key Observations
• There are significant philosophical differences in how a DB schema and an Ontology schema are modeled
• In spite of these distinctions, many schema matching techniques overlap significantly.
• Have we advanced the state of art in ontology schema matching?
Schema Integration – DB vs. Ontology
Have we advanced the state of art ?
Schema Integration – techniques usedSchema matching techniques Information exploited
DB Ontology
• Syntactic– Linguistic: Matching
names, descriptions, namespaces etc.
– Constraint-based: Constraint matches on data types, value ranges, uniqueness, cardinalities etc.
• Matching Table and column level names and constraints
• Matching class, properties/ relationship, attribute level names and constraints
Schema level
Schema Integration – techniques usedSchema matching techniques Information exploited
• Structural– Constraint-based: Tree /
Graph structure matching
• Matching structures of relational tables
• Matching class hierarchies and structures
DB OntologySchema level
Schema Integration – techniques usedSchema matching techniques Information exploited
• Linguistic– IR techniques, word frequencies, key terms, combination of
key terms etc.
• Constraint based– Numerical value patterns, ranges useful for recognizing
phone numbers etc.
DB OntologyInstance level
• Hybrid approaches use a combination of all techniques
Discovered semantic relationships• State of the art – in DBs and Ontologies
– Relationships with set semantics: overlap / disjointness / exclusion / equivalence / subsumption
– Their logical encodings are what they mean
• Of more interest is discovering arbitrary named relationships– Relationships such as works_for or causes have “real-world”
semantics. Their encoding in first order logic lacks semantic grounding.
• Matching and mapping closely tied. Ability to capture complex mapping (e.g., semantic proximity) puts significantly different demand on matching
Key Observation• DB and Ontology schema matching techniques overlap
significantly– Not much advancement since DB schema integration
efforts
• Ontologies formalize the semantics of a domain, but matching is still primarily syntactic / structural.– The semantics of ‘named relationships’ is largely
unexploited
• The real semantics lies in the relationships connecting entities– Modeled as first class objects in Ontologies– In DB, they are not explicit and have to be inferred
(Complex) named relationships and Ontology Matching
VOLCANO
LOCATIONASH RAIN
PYROCLASTICFLOW
ENVIRON.
LOCATION
PEOPLE
WEATHER
PLANT
BUILDING
DESTROYS
COOLS TEMP
DESTROYS
KILLS
(Complex) named relationships - example
Discovering such (complex) named relationships• Matching techniques have exhausted
Schema + Instance properties
• Ontology modeling de couples schema + instance base– Tremendous opportunity to exploit knowledge
present outside the ontology knowledge base (External structured, semi-structured and unstructured data sources)
Knowledge discovery and validation
PubMedetc.
PubMedetc.
DBs
Prediction of - Pathways- Symptoms of Diseases- Other complex relationship
Rele-vant docs
Rele-vant docs
Query and update
A Vision for Ontology Matching : Discovering simple to
complex matches – from schema, instances and corpus
SIM
PL
E T
O C
OM
PL
EX
MA
TC
HE
S
Ontologies
Heterogeneous data
Today, the Food and Drug Administration (FDA) is announcing that it has asked Pfizer, Inc. to voluntarily withdraw Bextra from themarket . Pfizer has agreed to suspend sales and marketing of Bextra in the , pending further discussions with the agency .
Semantic metadata
Possible identifiable matches: equivalence / inclusion / overlap / disjointness
Possible to identify more complex relationships fromthe corpus.
Corpus based schema matching
The Intuition
9284 documents 4733
documents
Disease or Syndrome
Biologically active substance
Lipid
affects
causes
affects
causes
complicates
Fish Oils Raynaud’s Disease???????
instance_of instance_of
5 documents
UMLS
MeSH
PubMed
ModifiersModified entitiesComposite Entities
The Method – Identify entities and Relationships in Parse Tree
Key Observation
• What is interesting is not the entity “estrogen” or “endometrium”
• The real knowledge lies in the complex and modified entities “an excessive endogeneous stimulation by estrogen”
Current KR frameworks do not model this. Capturing this might affect the way we
think of matching and mapping.
Converting candidate relationships to ontology matches• Linguistic and statistical challenges:
– Variations of entities, relationships and associations
• Translating instance level findings to the schema level– GOING FROM several discovered relationships
like “Deficiency in migraine causes Migraine” TO “substance X causes condition Y”
Discovery vs. Validation of relationships – two sides of the coin• Discovering complex relationships from
text is a hard problem– Natural Language challenges (not all sentences
are well formed)
• Validating complex relationships / hypothesis is relatively simpler
Corpus based Hypothesis validation
isaMagnesiumMigraine
Stress
Calcium Channel Blockers
Patient
affectedBy
inhibit
PubMed
Complex Query
SupportingDocument setsretrieved
Does magnesium alleviate effects of migraine in patients?One possible hypothesized connection
between magnesium and migraine….
From matching to mappings – several challenges• Mappings are not always simple
mathematical / string transformations
• Examples of complex mappings– Associations / paths between classes– Graph based / form fitting functions
E1:Reviewer
E6:Person
E5:Person
E2:Paper
E4:Paper
E7:Submission
E3:Person
author_ofauthor_of
author_of
author_of
author_of
knows
knows
Number of earthquakes with magnitude > 7 almost constant. So if at all, then nuclear tests only cause earthquakes with magnitude < 7
The take home message
A world beyond simple matches and mappings• The distinction between schema and instances is
slowly disappearing
• Integrating new and external data sources, mining and analyzing them is gaining importance.
• Tremendous opportunities and challenges in using more information than what is modeled in a schema and captured in an instance base.
Need to go beyond well-mannered schemas and knowledge representations;
and relatively simpler mappings
For more information
LSDIS Lab: http://lsdis.cs.uga.eduKno.e.sis Center: http://www.knoesis.org