query translation for ontology-extended data sources
DESCRIPTION
TRANSCRIPT
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 1
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Query Translation for Ontology-extended Data Sources
Jie Bao1, Doina Caragea2, Vasant Honavar1
1Artificial Intelligence Research Laboratory,Department of Computer Science,
Iowa State University, Ames, IA 50011-1040, USA{baojie, honavar}@cs.iastate.edu
2Department of Computing and Information SciencesKansas State University, Manhattan, KS 66506, USA
{dcaragea}@ksu.edu
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 2
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS Group
Vasant Honavar Jie BaoDoina Caragea
Jyotishman Pathak Neeraj Koul
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 3
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
• Ontology-Extended Data Source– Schema, Data, and Ontology
• Query Translation for OEDS– Ontology mapping, query translation / soundness / completeness
• Implementation and Optimization– The INDUS system
• Conclusion
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 4
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
BackgroundData revolution• Bioinformatics
– Over 200 data repositories of interest to molecular biologists alone• Environmental Informatics• Enterprise Informatics • Medical Informatics• Social Informatics ...
Connectivity revolution (Internet and the web)
Integration revolution • Need to understand the elephant as opposed to examining
the trunk, the tail, etc.
Needed – infrastructure to support collaborative, integrative analysis of data
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 5
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Solution: INDUS for Learning from Semantically Heterogeneous Distributed Autonomous Data Sources
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 6
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
(Relational) Data Source
DData Set
Extensional Definition(Facts)
MScBob
First-yearAlice
statusname
Student
algorithmCS511
data structureCS103
namecode
Classes
CS511Bob
CS103Alice
classinstructor
Registers
SSchemaIntensional Definition
Classes
Faculty Teaches
name:String
code:String
rank:String
name:StringStudent Registers
name:String status:String
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 7
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Semantic Extensions of Data Sources
Return classes that graduate students are registered in
Return all people in the database
?
?
DS
MScBob
First-yearAlice
statusname
Student
algorithmCS511
data structureCS103
namecode
Classes
CS511Bob
CS103Alice
classinstructor
Registers
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 8
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Ontology-Extended Data Source
Classes
Instructor Teaches
name:String
code:String
rank:String
name:StringStudent registers
name:String status:String
People
Student Instructor
MScBob
First-yearAlice
statusname
Student
student
Undergrad Graduate
First-year
MSc
Fourth-year
…PhD
MA
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 9
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Ontology-Extended Data Source
DData Set
SSchemaOS Schema Ontology
OD
Data Content Ontology
O’S
O’D
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 10
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Ontology-Extended Data Source
• Relational Model (Reiter, 1982)
– Schema S: a first order language with predicate symbols RS, each for a relational table (e.g. Classes, Faculty)
– Data Set D: a first order interpretation of S with domain
• Ontology-Extended (Relational) Data Source (Caragea et al. 2004)– Extending relational model with
– Schema Ontology: a first order language LOS with predicate symbols ROS, and RS ROS
– Data Content ontology: OOD=(LOD,DOD)
• LOD: a first order language with predicate symbols ROD, ROD RS=
• DOD: a first order interpretation of LOD with domain ’, ’
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 11
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
OEDS: Example
S: Instructor(x,y); Classes(x,y), Student(x,y)…
MScBob
First-yearAlice
statusname
Student
D
Classes
Instructor Teaches
name:String
code:String
rank:String
name:StringStudent registers
name:String status:String
LOS
x,y, Student(x,y) Instructor(x,y) People(x)
isa(x,y) isa(y,z) isa(x,z)
LODDOD
isa(First-year,Undergraduate)isa(Undergraduate,Student)isa(MSc,Graduate)…
see survey [Shvaiko & Euzenat 2005]
OD
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 12
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
• Ontology-Extended Data Source– Schema, Data, and Ontology
• Query Translation for OEDS– Ontology mapping, query translation / soundness / completeness
• Implementation and Optimization– The INDUS system
• Conclusion
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 13
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Query
• Tuple Relational Calculus (TRC)– Tuple: a multiset of attributes– TRC Relational Algebra – q(t) := Student(t) (t.status=”Graduate”)
• Ontology-Extended Tuple Relational Calculus– q(t) := Student(t) isa(t.status, Graduate)
We focus on data content ontologies in this talk
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 14
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Query Translation
DS
O2
q’
DS
q
O1
User Ontology Data Source Ontology
M
Ontology Mapping
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 15
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Ontology Mapping
isa1(x, c1) ^ into(c1, c2) isa2(x, c2)
isa1(c1, x) ^ onto(c1, c2) isa2(c2, x)
……
Student
Undergrad Graduate
First-year
MSc
Fourth-year
…PhD
MA
Student
Undergrad Postgraduate
Freshman…
DoctoralMaster
into
onto
equ
isa1 isa2
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 16
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Query Translation
DS
O2
q’q
DS
O1
M
Student(t) ^ isa1(t:status,Master) Student(t) ^ isa2(t.status, Graduate)
Student(t) ^ isa2(t.status, MSc)
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 17
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Soundness, Completeness and Exactness
{q}
{q’}
{q’}
{q} {q}={q’}
SoundTranslation
CompleteTranslation
ExactTranslation
q := Student(t) ^ isa1(t:status,Master)
q’ := Student(t) ^ isa2(t.status, MSc)
q’ := Student(t) ^ isa2(t.status, Graduate) Non-existent
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 18
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Most Informative Translation
c1d1
d2
O1O2
q := isa1(x,c1)
isa2(x,d1) isa2(x,d2)Most informative sound translation!
onto
onto
LUB (least upper bound)
isa2(x,d1)
isa2(x,d2)find its sound translation(s)
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 19
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Query Translation Rules
For hierarchical ontologies
(similarly for complete translation of complex queries)
Atomic conditions
Complex conditions
GLB=greatest lower bound, LUB=least upper bound
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 20
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
• Ontology-Extended Data Source– Schema, Data, and Ontology
• Query Translation for OEDS– Ontology mapping, query translation / soundness / completeness
• Implementation and Optimization– The INDUS system
• Conclusion
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 21
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS Tools
Ontology Editor
Schema Editor
Mapping Editor
Data Editor
Query Engine and Interface
…
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 22
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS – Mapping Editor
http://sourceforge.net/projects/indus-project/
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 23
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS – Data Editor
http://sourceforge.net/projects/indus-project/
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 24
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
INDUS – Query Editor
http://sourceforge.net/projects/indus-project/
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 25
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Optimization for Scalability
• Database storage for ontologies• Using transitive closure for fast inference with
hierarchies• Server-side caching
– Using temporary tables on the data source server
• Client-side caching – Of remote ontologies and ontology mappings
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 26
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Performance
• O1: Enzyme Classification (EC) hierarchy (4,564 terms)• M: SCOP to EC mapping [Richard George et. al.] with 15,765 rules• O2: SCOP (Structural Classification of Proteins) hierarchy (86,766 terms).
ServerClient
D
Internet
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 27
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Performance
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 28
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Outline
• Ontology-Extended Data Source– Schema, Data, and Ontology
• Query Translation for OEDS– Ontology mapping, query translation / soundness / completeness
• Implementation and Optimization– The INDUS system
• Conclusion
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 29
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Conclusion
• We have studied the query translation process for relational data sources extended with context-specific data content ontologies.– how to exploit ontologies and mappings for flexibly querying
semantic-rich data sources.– query translation strategy that works for hierarchical ontologies. – the conditions under which the soundness and completeness of
such a procedure can be guaranteed.
• Ongoing Work– More expressive ontologies, e.g., Description Logics– Schema ontology + data content ontology– Statistical learning from OEDS
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 30
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Thank You!
July 23,2007, Semantic e-Science Workshop @AAAI 2007, Vancouver, Canada 31
Iowa State University Department of Computer ScienceArtificial Intelligence Research Laboratory
Semantics Preserving Translation
• Conservative Extension