jmora.di.oeg.3x1e

32
Query Planning for Semantic Information Integration José Mora, Óscar Corcho {jmora, ocorcho}@fi.upm.es Facultad de Informática Universidad Politécnica de Madrid Campus de Montegancedo s/n 28660 Boadilla del Monte, Madrid, Spain

Upload: jose-mora

Post on 19-Jan-2015

593 views

Category:

Documents


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Jmora.di.oeg.3x1e

Query Planningfor Semantic

Information IntegrationJosé Mora, Óscar Corcho

{jmora, ocorcho}@fi.upm.esFacultad de Informática

Universidad Politécnica de MadridCampus de Montegancedo s/n

28660 Boadilla del Monte, Madrid, Spain

Page 2: Jmora.di.oeg.3x1e

General Scenario – Semantic Information Integration

2

A A

Query There is no integration when we have one single database. We can access all the information in the

database just by querying it.

When the information is distributed in several databases, retrieving

information from them all automatically is

desirable, but not trivial.

We need a schema according to which the

user will write the queries. This schema will most of the times differ from the local schemas in each database, which will need to be mapped.

When the global schema is an ontology it presents

additional advantages: richer query language,

explicit semantics, inference, easier

integration with other sources… (“semantic

upgrade”) [Wache01 – a]

Local sources may have explicit semantics, their

own ontologies. Integration at semantic

level. Mapping creation is split (divide and conquer) changes propagation is

limited.[Wache01 - c]

Local ontologies ease integration so much that some authors proposed models with no global ontologies, integration

and conversion between schemas would be

automatic. [Wache01 - b] Eg: PayGo from Google.

Let’s consider this model for now. Semantic

upgrade happens first. Then integration occurs

at the semantic level. [Wache01 – c] Separation eases comprehension ;)

and integration…

Integration is eased as it happens at the semantic level, the details of the sources are abstracted.

This allows a greater heterogeneity in sources to be supported, a more

powerful integration.

BTW: H. Wache et al., “Ontology-based integration of information-a survey of existing approaches,”

in: Ontologies and Information Sharing, vol. 2001, 108-117.

An ontology is a explicit, formal specification of a shared

conceptualization. Provides a shared vocabulary which can be

used to model a domain. As a global schema.

Ontologies can be defined according to different languages, differerent in expressiveness and thus in their properties wrt what

can be done with them, complexity for tasks… even decidability

The (OWL) DL-Lite family was born as a group of DLs with reduced

expressiveness for efficient query answering. This evolved to the

OWL2 profiles EL and QL.

Page 3: Jmora.di.oeg.3x1e

Schema definition

GAV

LAV

GLAV

Simple Mappings

Query distribution

Ad-hoc approaches

Rewriting

Path Search

Planning

Reasoning

Disparities

Lexic

Syntax

Paradigm

Terms

Concepts

Pragmatics

Yes/No options

Materialization

Update information

Semantic description

Quality description

Many others

•Straightforward reformulation

•Source changes affect the system

GAV

•Easy to add & remove sources

•Global schema has to be stable

LAV

•Pros of both, cons of none

•Harder to manage

GLAV

•“Simple” to generate automatically

•Non-constructive for integration

Simple Mappings

• Calvanese• Perez-Urbina• SoftFacts

• PayGo: Large-Scale, mapping based• OBSERVER: Semantic mapping based• Battré, Quilitz: Semantic, SPARQL

based• Sibarski: Semantic, SPARQL,

preferences• Networked Graphs: Semantic, ad-hoc

• Bleiholder• Wang

• Bucket• Inverse rules• PICSEL

• SIMS• Planning-by-

rewriting• HTN

Scenario - Subproblems

3

Page 4: Jmora.di.oeg.3x1e

State of the Art - Solutions

Web services (planning)

ISI

SIMS

Planning-by-rewriting

HTN

Databases

Rewriting

Bucket

Inverse Rules

Ontology based

PICSEL

OBSERVER

Path oriented

Bleiholder

Wang

Semantic

Distribute queries

DARQ

Battré

Siberski (preferences)

Reasoning

Calvanese

Pérez-Urbina

SoftFacts (fuzzy)

44

Search for sources

Search for concepts

Search for concepts

and sources

Search for sources

Physical vs Logical search

Page 5: Jmora.di.oeg.3x1e

•EL: description logic similar to DL-Lite (retains someValuesFrom ) •H: role inclusions•I: inverse roles•O: basic concepts like {a}•¬: allows negative inclusions

Work – Base: REQUIEM

• Base: REQUIEM by Pérez-Urbina• Ontology as the global schema, (DL ELHIO¬)• Rewrites to datalog queries by saturation• Logical search but not physical search ( !∃ local schema)

5

Clauses Clause tree

QueryDatalog program

Set of queries

Mediator

clausification prune

saturation

unfolding

Page 6: Jmora.di.oeg.3x1e

Clausification [Pérez-Urbina2010]

6Asunción Gómez Pérez

Page 7: Jmora.di.oeg.3x1e

Work – previous work

• My previous work: Modification of REQUIEM• Ontology partially covered by the information source prune• Increase in efficiency in the process because of this prune• Futile queries are not generated, less queries in the result

7

Clauses Clause tree

QueryDatalog program

Set of queries

Mediator

clausification prune

saturation

unfolding

Page 8: Jmora.di.oeg.3x1e

• Checked time for naïve and greedy modes• Global and first modes for ontology pruning• Only one ontology, several mapping files

Results - Efficiency

PU-N

PU-G

R2OO-Atlas-NG

R2OO-Atlas-GF

R2OO-EGM-NG

R2OO-EGM-GF

R2OO-BCN-NG

R2OO-BCN-GF

0 500 1000 1500 2000 2500 3000

ms

8

Page 9: Jmora.di.oeg.3x1e

Results – Effectiveness – # of Clauses (~queries) (1/2)

9

• Checked the number of clauses at several stages of the algorithm• After parsing the initial ontology• Pruning the clauses with the information relevant for the query• Saturating the clauses• Unfolding the clauses• Pruning again (only performed in greedy mode)

• Checked naïve and greedy modes for inference• Checked global and first modes for ontology pruning• Only one ontology, several mapping files providing

different coverages

Page 10: Jmora.di.oeg.3x1e

Results – Effectiveness – # of Clauses (~queries) (2/2)

PU-NPU-G

R2OO

-Atla

s-NG

R2OO

-Atla

s-G

F

R2OO

-EG

M-N

G

R2OO

-EG

M-G

F

R2OO

-BCN-N

G

R2OO

-BCN-G

F0

500

1000

1500

2000

2500

After parsingAfter pruning (i)After saturationAfter unfoldingAfter pruning (ii)

10

Page 11: Jmora.di.oeg.3x1e

Example

Hydrographic phenomenon

Water

Freshwater

Seawater

Continental Water

Groundwater

Ground Stream

Aquifer

Surfacewater

Running Water

Transition Water

Upwelling

Still Water

Punctual Hydronym

Water Collector

Junction

Mouth

11

Continental_Water(x) :- Groundwater(x)Groundwater(x) :- Ground_Stream(x)

Continental_Water(x) :- Ground_Stream(x) Bold: mapped predicates

Query:Q(x) :- Water(x)

Page 12: Jmora.di.oeg.3x1e

After Pruning

• Q(x) :- Water(x)• Water(x) :- Freshwater(x)• Water(x) :- Seawater(x)• Water(x) :- Continental_Water(x)• Continental_Water(x) :-

Groundwater(x)• Continental_Water(x) :-

Surfacewater(x)• Groundwater(x) :-

Ground_Stream(x)• Groundwater(x) :- Aquifer(x)• Surfacewater(x) :-

Running_Water(x)• Surfacewater(x) :-

Transition_Water(x)• Surfacewater(x) :- Upwelling(x)• Surfacewater(x) :- Still_Water(x)

• Q(x) :- Water(x)• Water(x) :- Freshwater(x)• Water(x) :-

Continental_Water(x)• Continental_Water(x) :-

Groundwater(x)• Continental_Water(x) :-

Surfacewater(x)• Groundwater(x) :-

Ground_Stream(x)• Groundwater(x) :- Aquifer(x)

↑ New algorithm (presenting now)

← Algorithm in REQUIEM

12

Page 13: Jmora.di.oeg.3x1e

After saturating

• Q(x) :- Water(x)• Water(x) :- Freshwater(x)• Water(x) :- Seawater(x)• Water(x) :- Continental_Water(x)• Continental_Water(x) :-

Groundwater(x)• Continental_Water(x) :-

Surfacewater(x)• Groundwater(x) :-

Ground_Stream(x)• Groundwater(x) :- Aquifer(x)• Surfacewater(x) :-

Running_Water(x)• Surfacewater(x) :-

Transition_Water(x)• Surfacewater(x) :- Upwelling(x)• Surfacewater(x) :- Still_Water(x)

• Q(x) :- Freshwater(x)• Q(x) :- Freshwater(x)• Continental_Water(x) :-

Ground_Stream(x)• Continental_Water(x) :-

Aquifer(x)• Continental_Water(x) :-

Surfacewater(x)

↑ New algorithm (presenting now) (non retrievable predicates have been removed through inference)

← Algorithm in REQUIEM

13

Page 14: Jmora.di.oeg.3x1e

Work – current work

• @ISI: Integration w/ GAV mediator, DQP, OGSA-DAI • Other mediators should be straightforward• Real tests (w/ schemas and data): not done (yet)• Always open to suggestions for future (remote) collaboration

1414

Clauses Clause tree

QueryDatalog program

Set of queries

Mediator

clausification prune

saturation

unfolding

Page 15: Jmora.di.oeg.3x1e

End

Questions, comments, proposals, suggestions, … all feedback is welcome.

15

Page 16: Jmora.di.oeg.3x1e

Data Integration Working Group in the

Ontology Engineering Group

OEG

Facultad de Informática

Universidad Politécnica de Madrid

Campus de Montegancedo sn

28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net

Phone: 34.91.3367439, 34.91.3366605

Fax: 34.91.3524819

Page 17: Jmora.di.oeg.3x1e

17

Semantic e-Science

•Data Integration•Ontology-based DB access: R2O and ODEMapster

•Semantic Grid•S-OGSA Architecture•WS-DAIOnt-RDF(S) OGF standard•RDF(S) Grid Access Bridge

S-OGSA Model

Semantic ProvisioningService

Knowledge Resource

Grid Entity

Semantic Binding

Grid ServiceIs-a

0..m

0..m

1..m1..m

Semantic aware Grid Service

produce

0..m0..m

consume

VOManager

Policy

SatelliteImage File

InsuranceContract

Is-a

Knowledge Entity

Is-a

Ontology Service

Is-a

Reasoning Service

Semantic BindingProvisioning Service

Annotation Tool/Service

Metadata Store/Service

Grid Resource

IntelligentDebugger

CoordinationService

Is-aIs-a

Is-a

Is-a

Knowledge Service

Is-a

Ontology

Rule set

Knowledge GridSemantic Grid

Semantic Entity

Is-a

Is-a

NegotiationService

RepositorySelectorService

RepositoryService

ResourceService

ListService

ContainerService

StatementService

PropertyService

ClassService

AltService

RDFSConnector

Web Service Tier

RDF(S) Storage Layer

RDF(S) Grid Access BridgeArchitecture

. . .

Sesame RDF Storage

SesameConnector

Jena RDF Storage

JenaConnector

AtlasRDF Storage

AtlasConnector

Upper service layerUpper service layer

Internediate service layerInternediate service layer

Lower service layerLower service layer

ll

Page 18: Jmora.di.oeg.3x1e

General scenario

18

A A

Query

Jose Mora – Query plans

Freddy Priyatna –Multi-RDB2RDF

Carlos Buil –Distributed

SPARQL queries

Jean-Paul Calbimonte – Multi-SensorNetwork2RDF

Victor Saquicela –Automatic WS semantic annotation

Several PhD students working in a shared

general scenario at UPM

Page 19: Jmora.di.oeg.3x1e

R2O++ - Freddy Priyatna

19Asunción Gómez Pérez

R2O Parser

R2O Mapping

Document

R2O Properties

Mapping objects

Result Set

R2O Unfolder

SQL

Query evaluator

R2O Postprocessor

TriplesJena

Model

Model WriterRDF

Document DB

Page 20: Jmora.di.oeg.3x1e

Semantic Streaming Data Access – Jean Paul Calbimonte

2020

Query reconciliation

q qrQuery

canonisation

Qc

Distributed Query

Processing

Data decanonisation

Data reconciliationd dr

Dc

Clie

nt

O-O mapping R2O mappings

SPARQLSTR (Og) SPARQLSTR (O1 O2 On) SNEEql (S1 S2 Sn)

SNEEql’ (S1 S2 Sn)

[tuplel1 l2 l3][tripleO1 O2 On][tripleOg]

Semantic Integrator

20

Page 21: Jmora.di.oeg.3x1e

21

Semantic Annotation of RESTful Services – Victor Saquicela

Internet

User

Repository

Web applications & API

input

Semantic annotation

Syntactic description

Semantic annotation

Syntactic description

SpellingSuggestions

Page 22: Jmora.di.oeg.3x1e

SparqlDQP – Carlos Buil

Page 23: Jmora.di.oeg.3x1e

Ontology Engineering Group

Prof. Dr. Asunción Gómez-Pérez, Dr. Oscar Corcho

Facultad de Informática

Universidad Politécnica de Madrid

Campus de Montegancedo sn

28660 Boadilla del Monte, Madrid

http://www.oeg-upm.net

{asun,ocorcho}@fi.upm.es

Phone: 34.91.3367439, 34.91.3366605

Fax: 34.91.3524819

Presenter: Jose Mora ([email protected])

Page 24: Jmora.di.oeg.3x1e

24Asunción Gómez Pérez

People

•Director: A. Gómez-Pérez•Research Group (37 people)

• 2 Full Professor• 4 Associate Professors • 1 Assistant Professor• 3 Postdocs• 17 PhD Students• 8 MSc Students• 2 Software Engineers

• Management (4)• 2 Project Managers• 1 System Administrator• 1 Secretary

• 50+ Past Collaborators• 10+ visitors

Page 25: Jmora.di.oeg.3x1e

Semantic e-Science (Data Integration, Semantic Grid)

Internet of Things

(Social) Semantic

Web

Natural Language Processing

Ontological Engineering

Research Areas

1995

19972000

2004 2008

Page 26: Jmora.di.oeg.3x1e

26Asunción Gómez Pérez

Research projects1999 20022000 2001 2003 2004 2005 2006 2007

HA98-0002

Katalyx

MKBEEM

OntoWeb

Esperonto

PIKON

HF02-0013

Knowledge Web

OntoGrid

ContentWeb

12 Ac. Especiales/Complementarias

Servicios Semánticos

REIMDOC (FIT)

Company EU Project Coordinators

Spanish Projects EU Project Participation

Group

IGN/RAE/AMPER/XMEDIA

Red/Gis4Gov/11811/UPnP/UpGrid/Autores3.0/WEBn+1

2008 2009 2010

SEEMPNeOn

Marie Curie

GeoBuddies

ADMIRE

SemSorGrid4Env

DynaLearn

España Virtual/mIO!/BuscamediaPLATA

SEALS

MONNET

WHO/IGN2011 2012 2013

Page 27: Jmora.di.oeg.3x1e

O. Specification O. Conceptualization O. ImplementationO. Formalization

1RDF(S)

OWL

Flogic

Ontology Restructuring(Pruning, Extension,

Specialization, Modularization)

8

O. Localization

9

Ontology Support Activities: Knowledge Acquisition (Elicitation); Documentation; Configuration Management; Evaluation (V&V); Assessment

1,2,3,4,5,6,7,8, 9

O. Aligning

O. Merging

Alignments5

5

5

Ontological ResourceReengineering

4

4

4

6

6

6

6

Knowledge Resources

Ontological Resources

O. Design Patterns

2

Non Ontological Resources

Thesauri

DictionariesGlossaries Lexicons

TaxonomiesClassification

Schemas

Non Ontological ResourceReuse

Non Ontological ResourceReengineering

2

2

O. Repositories and Registries

FlogicRDF(S)OWL

Ontology DesignPattern Reuse

7

3

Ontological ResourceReuse

3

27Asunción Gómez Pérez

Ontological Engineering

•METHONTOLOGY & WebODE•NeOn Methodology for building Networks of Ontologies

• Ontology Scheduling• Ontology Requirement

Specification• Ontology Reuse• Non Ontological Resource

Reuse and Reengineering• Ontology Localization• Ontology Mapping• Ontology Design Patterns• Ontology Change Propagation

Page 28: Jmora.di.oeg.3x1e

28Asunción Gómez Pérez

Ontologies and Natural Language Processing (NLP)

•LIR – Linguistic Information Repository•Multilingual ontologies & Label Translator•Lexico-Syntactic Patterns for automatic ontology building (Sp, En, Ge)

Entity Properties View

Lexical Entry

river

rivière

Lexicalization Information

Main Entry SI

Grammatical Number

Term Type

singular

acronym

Lexicalization Source

Lexicalization Notes

Definitions

Lexical Entry Information

Lexicalization Sense

Definition Source

flueve

IATE

Source

http://iate.europa.eu/iatediff/Search...

URL

01

Sense

en

Language in Context

BritannicalOnline

Source

http://www.britannica.com/...

URL

Notes URLLangFlueve and rivière are usually considered synonyms. However, the use of fleuve should be avoid when the stream does not flow in the sea.

en http://www.cnrtl.fr/

Lang

stream of water of considerable volume and length that flows into the see

en

Definition

Part Of Speechnoun

Synonyms

Translations

Scientific Name

rivière

river

Page 29: Jmora.di.oeg.3x1e

29Asunción Gómez Pérez

(Social) Semantic Web

•Semantic Web Framework•Semantic Portals•Semantic Wikis•Annotation and Browsing Tools

• Web content• Multimedia content in home environments

•NeOn Methodology for building Large Scale Semantic Web Applications•Benchmarking Semantic Web Technologies•Evolution of folksonomies and ontologies

Page 30: Jmora.di.oeg.3x1e

30Asunción Gómez Pérez

Internet of Things

• Topics• Mobile devices• Sensor networks• Ubiquitous computing• Large-scale data integration for mobile applications exploiting user-

generated content

• Large-scale data integration• Legacy DB• Sensor networks • User generated content

Page 31: Jmora.di.oeg.3x1e

31

Semantic e-Science

•Data Integration•Ontology-based DB access: R2O and ODEMapster

•Semantic Grid•S-OGSA Architecture•WS-DAIOnt-RDF(S) OGF standard•RDF(S) Grid Access Bridge

S-OGSA Model

Semantic ProvisioningService

Knowledge Resource

Grid Entity

Semantic Binding

Grid ServiceIs-a

0..m

0..m

1..m1..m

Semantic aware Grid Service

produce

0..m0..m

consume

VOManager

Policy

SatelliteImage File

InsuranceContract

Is-a

Knowledge Entity

Is-a

Ontology Service

Is-a

Reasoning Service

Semantic BindingProvisioning Service

Annotation Tool/Service

Metadata Store/Service

Grid Resource

IntelligentDebugger

CoordinationService

Is-aIs-a

Is-a

Is-a

Knowledge Service

Is-a

Ontology

Rule set

Knowledge GridSemantic Grid

Semantic Entity

Is-a

Is-a

NegotiationService

RepositorySelectorService

RepositoryService

ResourceService

ListService

ContainerService

StatementService

PropertyService

ClassService

AltService

RDFSConnector

Web Service Tier

RDF(S) Storage Layer

RDF(S) Grid Access BridgeArchitecture

. . .

Sesame RDF Storage

SesameConnector

Jena RDF Storage

JenaConnector

AtlasRDF Storage

AtlasConnector

Upper service layerUpper service layer

Internediate service layerInternediate service layer

Lower service layerLower service layer

ll

Page 32: Jmora.di.oeg.3x1e

32Asunción Gómez Pérez

Univ. of Amsterdam

Free Univ. of Amsterdam

DFKI

Univ. of Augsburg

Univ. of Karlsruhe

Univ. of Koblenz

Univ. of Hannover

Univ. of Mannheim

Univ. of Bielefeld

Forschungszentrum Informatik

Open University

Oxford University

Univ. of Manchester

Univ. of Liverpool

Univ. of Sheffield

Univ. of Aberdeen

Univ. of Edinburgh

Univ. of Southampton

Univ. of Hull

CNR

Univ. of Trento

Univ. of Bolzano

KSL. Stanford Univ.

Univ. of Galway (DERI)

INRIAUniv. of Athens

TUC

Free Univ. of Brussels

Colaboration with other research groups

Univ. of Wien

Univ. of NR & ALS

Univ. of Innsbruck

Ústav Informatiky

Academy of Sciences

Univ. of Tel Aviv

Univ. of Brasilia

Úniv. of Zurich