searching for data and services

18
ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1 DB Group @ unimo Searching for data and services F. Guerra F. Guerra 1 , A. Maurino , A. Maurino 2 2 , M. Palmonari , M. Palmonari 2 , G. Pasi , G. Pasi 2 2 , A. , A. Sala Sala 3 1 DEA - Università di Modena e Reggio Emilia, v.le Sarca 336, Milano, Italy DEA - Università di Modena e Reggio Emilia, v.le Sarca 336, Milano, Italy 2 DISCO - Università di Milano Bicocca, v.le Risorgimento 2, Bologna, Italy DISCO - Università di Milano Bicocca, v.le Risorgimento 2, Bologna, Italy 3 DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy 1st International Workshop on Interoperability through Semantic Data and Service Integration 25 June 2009 Camogli, Italy

Upload: abby

Post on 02-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

1st International Workshop on Interoperability through Semantic Data and Service Integration 25 June 2009 Camogli, Italy. Searching for data and services. F. Guerra 1 , A. Maurino 2 , M. Palmonari 2 , G. Pasi 2 , A. Sala 3 - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 1

DB

Gro

up @

uni

mo

Searching for data and services

F. GuerraF. Guerra11, A. Maurino, A. Maurino2 2 , M. Palmonari, M. Palmonari22, G. Pasi, G. Pasi2 2 , A. Sala, A. Sala33

11DEA - Università di Modena e Reggio Emilia, v.le Sarca 336, Milano, ItalyDEA - Università di Modena e Reggio Emilia, v.le Sarca 336, Milano, Italy22DISCO - Università di Milano Bicocca, v.le Risorgimento 2, Bologna, ItalyDISCO - Università di Milano Bicocca, v.le Risorgimento 2, Bologna, Italy

33DII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, ItalyDII - Università di Modena e Reggio Emilia, via Vignolese 905, Modena, Italy

1st International Workshop on Interoperability through Semantic Data

and Service Integration

25 June 2009Camogli, Italy

Page 2: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 2

DB

Gro

up @

uni

mo Outline

1. Motivation2. Building the Global Data and Service View at Set-up Time3. Data and eService Retrieval4. Conclusion and future work

Page 3: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 3

DB

Gro

up @

uni

mo Motivation

• The research on data integration and service discovering has involved from the beginning different (not always overlapping) communities.

– Data and services are described with different models, and different techniques to retrieve data and services have been developed.

• From a user perspective, the border between data and services is often not so definite, since data and services provide a complementary vision about the available resources.

• Users need new techniques to manage data and services in a unified way.• Integration of data and services can be tackled from different

perspectives.– Access to data is guaranteed though Service Oriented Architectures (SOA), and

Web services are exploited to provide information integration platforms; – Providing a global view on the data sources and on eServices available in the

peer to support the access to the two complementary kinds of resources at a same time.

Page 4: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 4

DB

Gro

up @

uni

mo Motivation (2)

The problem we address in is to retrieve, among the many services available, the ones that are related to the query, according to the semantics of the terms involved in the query.

Select Name, Countryfrom AccommodationWhere City=’Modena’

Page 5: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 5

DB

Gro

up @

uni

mo The approach (overview)The approach (overview)

• We assume to have a mediator-based data integration system which provides a global virtual view of data - the Semantic Peer Data Ontology (SPDO).

• We assume to have a set of semantically annotated service descriptions.– Ontologies used in the service descriptions can be developed outside the peer

and are not known in advance, in the integration process.

• We propose a semantic-based approach to perform data and service integration:

– given a SQL- like query expressed in the terminology of the SPDO, retrieve all the services that can be considered “related” to the query on the data sources.

• The approach developed is based on:– a mediator-based data integration system, the MOMIS system (Mediator

envirOnment for Multiple Information Sources); – a service retrieval engine based on IR techniques performing semantic indexing

of service descriptions and keyword-based semantic search.

Page 6: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 6

DB

Gro

up @

uni

mo The approach (overview)The approach (overview)

• The integration of data and services is achieved by: 1. building the SPDO (a functionality already provided by MOMIS), 2. building a Global Service Ontology (GSO) consisting of the ontologies used in the

service semantic descriptions, 3. defining a set of mappings between the SPDO and the GSO, 4. exploiting, at query time, query rewriting techniques based on these mappings

to build a keyword-based query for service retrieval expressed in the GSO terminology starting from a SQL-like query on the data sources.

Page 7: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 7

DB

Gro

up @

uni

mo Building the Global Data and Service View Building the Global Data and Service View

The global light service ontology is built by means of the following steps: Service indexing, Global Service Ontology (GSO)

construction, Global Light Service Ontology

(GLSO) construction and Semantic Similarity Matrix (SSM) definition.

The SPDO is built by exploiting the MOMIS integration system

Page 8: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 8

DB

Gro

up @

uni

mo MOMISMOMIS

Page 9: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 9

DB

Gro

up @

uni

mo Service IndexingService Indexing

• Our approach requires a formal representation of the service descriptions and it is based on full text indexing which extracts terms from six specific sections of the service description:

– service name, – Service description, – input, – output, – pre-condition – post-condition

• A set of index terms I that will be part of the dictionary is extracted.– IO= the set of index terms consisting of ontology – IT = the set index terms extracted from textual descriptions

• The indexing structure is based on a “structured document” approach, where inverted file structure consists of:

– a dictionary file based on I, – a posting file, with a list of references to the services’ sections where the

considered term occurs

Page 10: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 10

DB

Gro

up @

uni

mo GSO constructionGSO construction

• The GSO is built by:– loosely merging each service ontology O such that i belongs to O for some i in IO

– associating a concept Ci with each i in IT, introducing a class Terms subclass of Thing in the GSO and stating that for every i in IT, Ci is subclass of Terms

• “loosely merging” means that SOs are merged without attempting to integrate similar concepts across the different integrated ontologies.

– if the source SOs are consistent, the GSO can be assumed to be consistent– Loose merging is clearly not the optimal choice with respect to ontology

integration– Since the XIRE component is based on approximate IR techniques and semantic

similarity, approximate solutions to the ontology integration problem can be considered acceptable; instead, the whole GSO building process need to be fully automatized.

Page 11: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 11

DB

Gro

up @

uni

mo

GLSO construction and Semantic Similarity GLSO construction and Semantic Similarity MatrixMatrix

• The GSO may result extremely large in size: only a subset of the terms of the ontologies are relevant to the SWS descriptions.

– a technique to reduce the ontology size is exploited and a GLSO (Global Light Service Ontology) is obtained.

– We extract from the GSO, the subontology that preserves the meanings of the terms explicitly used in the service descriptions, namely, the set of the index terms I.

• The Semantic Similarity Matrix (SSM), which is exploited later on for query expansion at query time, is computed.

– The SSM is defined by analyzing the GLSO structure, according to some semantic measure developed in literature and takes into account subclass paths, domain and range restrictions on properties, membership of instances, and so on.

Page 12: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 12

DB

Gro

up @

uni

mo Mapping of Data and Service OntologiesMapping of Data and Service Ontologies

• Mappings between the elements of the SPDO and the GLSO are generated by exploiting and properly modifying the MOMIS clustering algorithm.

• The clustering algorithm takes as input the SPDO and the GLSO with their associated metadata and generates a set of clusters of classes belonging to the SPDO and the GLSO.

• Mappings are automatically generated exploiting the clustering result. – A cluster contains only SPDO classes: it is not exploited for the mapping

generation; this cluster is caused by the selection of a clustering threshold less selective than the one chosen in the SPDO creation process

– A cluster contains only GLSO classes: it is not exploited for the mapping generation; it means that there are descriptions of Web Services which are strongly related

– A cluster contains classes belonging to the SPDO and the GLSO: this cluster produces for each SPDO class a mapping to each GLSO class

Page 13: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 13

DB

Gro

up @

uni

mo ExampleExample

The following mappings are generated with the application of our technology:

Accommodation --> Hotel

Accommodation.Name --> Hotel.Denomination

Accommodation.City --> Hotel.Location

Accommodation.Country --> Hotel.Country

HotelHotel.DenominationHotel.LocationHotel.Country

GLSO fragmentSPDO fragment

Page 14: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 14

DB

Gro

up @

uni

mo Data and eService RetrievalData and eService Retrieval

select <select_attribute_list>

from <from_class_list>

where <condition>

• The answer to this query is a data set from the data sources together with a set of services which are potentially useful, since they are related to the concepts appearing in the query and then to the retrieved data.

• The query processing is divided into two simultaneously executed steps:– data set from the data sources is obtained with a query processing on an

integrated viewThe results are obtained by exploiting the MOMIS Query Manager which rewrites the global query as an equivalent set of queries expressed on the local schemata (local queries), by means of an unfolding process

– a set of services related to the query is obtained by exploiting the mapping between SPDO and GLSOs and the concept of relevant service mapping.Services are retrieved by the XIRE (eXtended Information Retrieval Engine) component, which is a service search engine based on the vector space.

Page 15: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 15

DB

Gro

up @

uni

mo Data and eService Retrieval (overview)Data and eService Retrieval (overview)

Page 16: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 16

DB

Gro

up @

uni

mo Managing keywordsManaging keywords

• Given a query in an SQL-like notation expressed the SPDO terminology, the set of keywords extracted consists of:

– all the classes given in the “FROM” clause, – all the attributes and the values used in the “SELECT” and “WHERE” clauses – all their ranges defined by ontology classes.

• The set of keywords are exploiting the mappings between the SPDO and the GLSO.

• Semantic similarity between GLSO terms defined in the SSM is exploited to expand the keyword set into a weighted terms

Page 17: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 17

DB

Gro

up @

uni

mo eServices retrievaleServices retrieval

• Query evaluation is based on the vector space model:– by this model both documents (that is Web Service descriptions) and queries

(extracted keywords) are represented as a vector in a n-dimensional space.– Each vector represents a document, and it will have weights different from zero

for those keywords which are indexes for that description. – Relevance weights are used to modify the weights in the list resulting from

keyword evaluation process.

Page 18: Searching for data and services

ISDSI 2009 Francesco Guerra– Università di Modena e Reggio Emilia 18

DB

Gro

up @

uni

mo Conclusion and future workConclusion and future work

• In this paper we introduced a technique for publishing and retrieving a unified view of data and services.

• Such unified view may be exploited for improving the user knowledge of a set of sources and for retrieving a list of web services relate to a data set.

• The approach is semi-automatic, and works jointly with the tools which are typically provided for searching for data and services separately.

• Future work will be addressed on evaluating the effectiveness of the approach in the real cases provided within the NeP4B project, and against the OWLS-TC benchmark.