INCMAP: A JOURNEY TOWARDS ONTOLOGY-BASED DATA INTEGRATION
CHRISTOPH PINKEL (MAIN AUTHOR), CARSTEN BINNIG,
ERNESTO JIMENEZ-RUIZ, EVGENY KARMALOV, ET AL.
EXPLORING DATABASES CAN BE TEDIOUS…
DBLP CMT EASYCHAIR
Author of paper with
title ‘IncMap’?
SQL 2 SQL 1 SQL 3
Schema 1 Schema 2 Schema 3
PROBLEM 1: TOO MANY TABLES
Author of paper with
title ‘IncMap’?
Id Name …
Id Name …
Id Name …
Id Name …
Id Name …
Id Name …
Id Name …
Id Name …
Id Name …
Id Name …
Id Name …
A typical SAP schema has more than 10.000 tables
PROBLEM 2: LIMITED EXPRESSIVENESS
Person
Author Reviewer
name domain
sub-class
area
domain
domain aid name e-mail 1 Lennon a@b
rid name area 1 Harrison Onto
pid e-mail 1 a@b
pid area 2 Onto
pid name 1 Lennon 2 Harrison
pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer
Ontology
Author Reviewer
Person Author Reviewer
Person
Relational Schema (Option 1)
Relational Schema (Option 3)
Relational Schema (Option 2)
Modeling generalization is “messy”
PROBLEM 3: TECHNICAL DESIGN
BDC_IXN_FACT_MA
BDC_ACCOUNT_DIM
BDC_DEMOGRAPHICS_DIM BDC_IXN_FACT_WA
Other issues: • De-normalization (i.e., merge tables) • No foreign keys! • Performance optimizations (horizontal, vertical
fragmentation, …)
ONTOLOGY-BASED DATA ACCESS
DBLP CMT EASYCHAIR
ONTOLOGY-BASED DATA ACCESS
SQL 2 SQL 1 SQL 3
HIGH-LEVEL QUERY
Author of paper with
title ‘IncMap’?
Person
Author Reviewer
name domain
sub-class
area
domain
domain aid name e-mail 1 Lennon a@b
rid name area 1 Harrison Onto
pid e-mail 1 a@b
pid area 2 Onto
pid name 1 Lennon 2 Harrison
pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer
Ontology
Author Reviewer
Person Author Reviewer
Person
Relational Schema (Option 1)
Relational Schema (Option 3)
Relational Schema (Option 2)
Minimal Ontology (in OWL QL)
ONTOLOGY-BASED DATA ACCESS
Relational Schema
Person
Author Reviewer
name domain
sub-class
area
domain
domain aid name e-mail 1 Lennon a@b
rid name area 1 Harrison Onto
pid e-mail 1 a@b
pid area 2 Onto
pid name 1 Lennon 2 Harrison
pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer
Ontology
Author Reviewer
Person Author Reviewer
Person
Relational Schema (Option 1)
Relational Schema (Option 3)
Relational Schema (Option 2) Mapping?
Ontology
IncMap: A Mapping Tool for Relational-To-Ontology Data Integration
THE JOURNEY OF INCMAP
First version of IncMap
• Incremental mapping
• Leverage lexicographical and structural similarity
Christoph Pinkel, et al.: Pay as you go Matching of Relational Schemata to OWL Ontologies with IncMap. International Semantic Web Conference 2013
THE JOURNEY OF INCMAP
First version of IncMap
• Incremental mapping
• Leverage lexicographical and structural similarity
Second version of IncMap
• Consider typical design patterns
• Leverage reasoning (open vs. closed-world)
• Bootstrap mappings (fully automatic)
Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Evgeny Kharlamov, Andriy Nikolov, Andreas Schwarte, Christian Heupel, Tim Kraska: IncMap: A Journey towards Ontology-based Data Integration. BTW 2017
STEP 1: MAPPING TO INCGRAPHS
Person'
ID'
...'
Paper'
?tle'
PersID'(FK)'
...'
Person'ref'
PersID' Paper'ref'
?tle'val'
PersID'ID'
val' val'
varchar'type'
Author'domain'
writes' Paper'range'
Class'
Object'Property'
type'
Datatype'Property'
hasTitle'domain'
type'
type'
subClassOf'
Person'
type'
Author'ref'
writes' Paper'ref'
hasTitle'val'
Person' string'type'
subClassOf'
Relational Schema R Ontology O
IncGraph(R) IncGraph(O)
Main Reason: Mitigate structural differences
IncGraph(R)
STEP 2: REASONING AND PATTERNS
Person'ref'
PersID' Paper'ref'
?tle'val'
PersID'ID'
val' val'
varchar'type'
mul?Etype'
Author'ref'
writes' Paper'ref'
hasTitle'val'
Person' string'type'
subClassOf'
Author'ref'
writes' Paper'ref'
hasTitle'val'
Person' string'type'
subClassOf'
Pattern: Inheritance Reasoning
Person
Author Reviewer
name domain
sub-class
area
domain
domain aid name e-mail 1 Lennon a@b
rid name area 1 Harrison Onto
pid e-mail 1 a@b
pid area 2 Onto
pid name 1 Lennon 2 Harrison
pid name e-mail area type 1 Lennon a@b - author 2 Harrison - Onto reviewer
Ontology
Author Reviewer
Person Author Reviewer
Person
Relational Schema (Option 1)
Relational Schema (Option 3)
Relational Schema (Option 2)
Person'ref'
PersID' Paper'ref'
?tle'val'
PersID'ID'
val' val'
varchar'type'
IncGraph+(R) IncGraph+(O)
IncGraph(O)
REASONING: TWO OPTIONS
Option 1: Full reasoning
1. Reasoning on the base ontology using OWL QL
2. Add all derivable elements to IncGraph(O)
Option 2: Custom reasoning (to close “modeling gaps”)
1. Reasoning on the IncGraph(O)
• Generalization hierarchies • Additional domain and range information • …
2. Add selected elements to IncGraph(O) set weights (see next slides)
STEP 3: PAIRWISE MATCHING
Author'ref'
writes' Paper'ref'
val'
…'
Person'ref'
PersID' Paper'ref'
val' val'…'
Target'
Source'
…'
Possible'Matches'
Author'ref'
writes' Paper'ref'Person' PersID' Paper'
Author'ref'
writes' Paper'ref'Paper' PersID' Person'
Paper'ref'
writes' Author'ref'Person' PersID' Paper'
1.0$0.1$0.2$
0.1$
0.1$0.5$
0.2$ 0.5$
0.2$
Person'ref'
PersID' Paper'ref'
?tle'val'
PersID'ID'
val' val'
varchar'type'
mul?Etype'
Author'ref'
writes' Paper'ref'
hasTitle'val'
Person' string'type'
subClassOf'
Pairwise Connectivity Graph
STEP 4: FIXPOINT COMPUTATION
• Human Input (Acceptance and Rejection of Mappings)
• Weights for Patterns (Probability of Pattern)
• Deactivation of Edges (based on Patterns)
Author'ref'
writes' Paper'ref'Person' PersID' Paper'
Author'ref'
writes' Paper'ref'Paper' PersID' Person'
Paper'ref'
writes' Author'ref'Person' PersID' Paper'
1.0$0.1$0.2$
0.1$
0.1$0.5$
0.2$ 0.5$
0.2$
Pairwise Connectivity Graph
Fixpoint Computation (Ext. Similarity Flooding)
0.7 0.5 0.9
0.3 0.3 0.3
Sub-class
0.9 1.0 1.0 1.0
Author'ref'
writes' Paper'ref'
hasTitle'val'
Person' string'type'
subClassOf'
EVALUATION: RODI BENCHMARK
Conferenceontology1
TargetOntologies(Schema)
Oil&gasontology
SourceDatabases
(Schema+Data)
CMTVariant
CMTCanon. … Conf.
VariantConf.Canon. … Single,large
real-worldschema
MappingRules? MappingRules? MappingRules?
…
Conferenceontology2
Mond.Variant
Mond.Rel. …
MappingRules?
Geodataontology
Variants:
1. Adjusted Naming
2. Structural Adjustments (e.g., hierarchies)
3. Removed foreign keys
4. Merging / Splitting of tables
5. Combined cases
SIGKDD Conference CMT
Christoph Pinkel, Carsten Binnig, Ernesto Jiménez-Ruiz, Wolfgang May, Dominique Ritze, Martin G. Skjæveland, Alessandro Solimando, Evgeny Kharlamov: RODI: A Benchmark for Automatic Mapping Generation in Relational-to-Ontology Data Integration. ESWC 2015
Real-World
https://github.com/chrpin/rodi
EVALUATION: RODI BENCHMARK
Evaluation queries:
• Queries simulate information need
• Can be additional input for mapping
• 56 queries from simple to complex
Metric: per-query F-measure
EVALUATION: COMPETITORS
Relational-to-Ontology Mapping Systems
• Ontop: http://ontop.inf.unibz.it (Free University of Bozen-Bolzano)
• Bootox: https://www.cs.ox.ac.uk/isg/tools/BootOX/ (University of Oxford)
General Mapping Systems (Baseline)
• COMA++: http://dbs.uni-leipzig.de/de/Research/coma.html (University of Leipzig)