overcoming semantic heterogeneity in spatial data infrastructures

14
Overcoming semantic heterogeneity in spatial data infrastructures M. Lutz a, , J. Sprado b , E. Klien c , C. Schubert d , I. Christ d a European CommissionJoint Research Centre (JRC), Via E. Fermi 1, 21027 Ispra, Italy b Center for Computing Technologies (TZI), Am Fallturm 1, 28359 Bremen, Germany c Institute for Geoinformatics (IfGI), Weseler Straße 253, 48151 Mu ¨nster, Germany d Delphi InformationsMusterManagement (DELPHI IMM), Friedrich-Ebert-Straße 8, 14467 Potsdam, Germany article info Article history: Received 20 December 2005 Received in revised form 29 May 2007 Accepted 21 September 2007 Keywords: Semantic heterogeneity Interoperability Spatial data infrastructures Ontologies abstract In current spatial data infrastructures (SDIs), it is still often difficult to effectively exchange or re-use geographic data sets. A main reason for this is semantic heterogeneity, which occurs at different levels: at the metadata, the schema and the data content level. It is the goal of the work presented in this paper to overcome the problems caused by semantic heterogeneity on all three levels. We present a method based on ontologies and logical reasoning, which enhances the discovery, retrieval, interpretation and integration of geographic data in SDIs. Its benefits and practical use are illustrated with examples from the domains of geology and hydrology. & 2008 Elsevier Ltd. All rights reserved. 1. Introduction Spatial data infrastructures (SDIs) play a major role for searching, accessing and integrating heterogeneous geo- graphic data sets and geographic information (GI) ser- vices. The standards of the Open Geospatial Consortium (OGC) provide a syntactical basis for data interchange between different user communities. But this is only the first step, as semantic heterogeneity (Bishr, 1998) still presents an obstacle on the way towards full interoper- ability (Egenhofer, 2002; Sheth, 1999; Sondheim et al., 1999). In contrast to syntax, which only defines the structure, semantics refer to the meaning of elements. In SDIs, existing standards fail to address semantic problems that occur due to heterogeneous data content and heterogeneous user communities (using different lan- guages, terminologies and perspectives). Semantic hetero- geneity occurs at different levels. At each of these levels, it can inhibit tasks that are essential to the success of SDIs. At the metadata level, semantic heterogeneity impedes the discovery of geographic information; at the schema level, semantic heterogeneity impedes the retrieval of geographic information; and at the data content level, semantic heterogeneity impedes the interpretation, integration and exchange of geographic information. It is the goal of the work presented in this paper to enhance SDIs by overcoming these problems. We present an ontology-based method for enhancing GI discovery, retrieval, interpretation and integration in SDIs, which has been developed in the meanInGs project. 1 To illustrate its benefits and practical use, we introduce two examples: an example from the geology domain that illustrates the benefits for interpretation and integration of GI Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences ARTICLE IN PRESS 0098-3004/$ - see front matter & 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2007.09.017 Corresponding author. Tel.: +39 0332 786759; fax: +39 0332786325. E-mail addresses: [email protected] (M. Lutz), [email protected] (J. Sprado), [email protected] (E. Klien), [email protected] (C. Schubert), [email protected] (I. Christ). 1 See http://www.meanings.de/. Computers & Geosciences 35 (2009) 739–752

Upload: m-lutz

Post on 05-Sep-2016

228 views

Category:

Documents


12 download

TRANSCRIPT

Page 1: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

Contents lists available at ScienceDirect

Computers & Geosciences

Computers & Geosciences 35 (2009) 739–752

0098-30

doi:10.1

� Cor

E-m

(J. Sprad

chris.Sc

ingrid.c

journal homepage: www.elsevier.com/locate/cageo

Overcoming semantic heterogeneity in spatial data infrastructures

M. Lutz a,�, J. Sprado b, E. Klien c, C. Schubert d, I. Christ d

a European Commission—Joint Research Centre (JRC), Via E. Fermi 1, 21027 Ispra, Italyb Center for Computing Technologies (TZI), Am Fallturm 1, 28359 Bremen, Germanyc Institute for Geoinformatics (IfGI), Weseler Straße 253, 48151 Munster, Germanyd Delphi InformationsMusterManagement (DELPHI IMM), Friedrich-Ebert-Straße 8, 14467 Potsdam, Germany

a r t i c l e i n f o

Article history:

Received 20 December 2005

Received in revised form

29 May 2007

Accepted 21 September 2007

Keywords:

Semantic heterogeneity

Interoperability

Spatial data infrastructures

Ontologies

04/$ - see front matter & 2008 Elsevier Ltd. A

016/j.cageo.2007.09.017

responding author. Tel.: +39 0332 786759; fax

ail addresses: [email protected] (M. Lutz), sp

o), [email protected] (E. Klien),

[email protected] (C. Schubert),

[email protected] (I. Christ).

a b s t r a c t

In current spatial data infrastructures (SDIs), it is still often difficult to effectively

exchange or re-use geographic data sets. A main reason for this is semantic

heterogeneity, which occurs at different levels: at the metadata, the schema and the

data content level. It is the goal of the work presented in this paper to overcome the

problems caused by semantic heterogeneity on all three levels. We present a method

based on ontologies and logical reasoning, which enhances the discovery, retrieval,

interpretation and integration of geographic data in SDIs. Its benefits and practical use

are illustrated with examples from the domains of geology and hydrology.

& 2008 Elsevier Ltd. All rights reserved.

1. Introduction

Spatial data infrastructures (SDIs) play a major role forsearching, accessing and integrating heterogeneous geo-graphic data sets and geographic information (GI) ser-vices. The standards of the Open Geospatial Consortium(OGC) provide a syntactical basis for data interchangebetween different user communities. But this is only thefirst step, as semantic heterogeneity (Bishr, 1998) stillpresents an obstacle on the way towards full interoper-ability (Egenhofer, 2002; Sheth, 1999; Sondheim et al.,1999). In contrast to syntax, which only defines thestructure, semantics refer to the meaning of elements. InSDIs, existing standards fail to address semantic problemsthat occur due to heterogeneous data content andheterogeneous user communities (using different lan-guages, terminologies and perspectives). Semantic hetero-

ll rights reserved.

: +39 0332 786325.

[email protected]

geneity occurs at different levels. At each of these levels, itcan inhibit tasks that are essential to the success of SDIs.

At the metadata level, semantic heterogeneity impedesthe discovery of geographic information; � at the schema level, semantic heterogeneity impedes

the retrieval of geographic information; and

� at the data content level, semantic heterogeneity

impedes the interpretation, integration and exchange of

geographic information.

It is the goal of the work presented in this paper toenhance SDIs by overcoming these problems. We presentan ontology-based method for enhancing GI discovery,retrieval, interpretation and integration in SDIs, which hasbeen developed in the meanInGs project.1 To illustrate itsbenefits and practical use, we introduce two examples:

an example from the geology domain that illustratesthe benefits for interpretation and integration of GI

1 See http://www.meanings.de/.

Page 2: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

Table 1Examples for different nomenclatures used in geological maps of the lower buntsandstein in Saxony-Anhalt

Author(s) of classification Fulda and Huelsemann Dockter and Puff Jung Radzinskia Rock description

Date 1930 1959 1968 1997

Survey map Eisleben Erdeborn Hettstedt

Stratgraphic short terms su2o su3d su5 suBDS Dolomitic sandstone

su2o su3’st su5 suBOW Interlaminated mixed layers

su2u su3’k su4 suBRG Rogenstein-Zone (Oolithic limestone)

su2u su2 su3 suCST Red–brown schistous clay, mudstone

su2u su2 su2 suCUW Fine-grained carbonate sandstone

su1 su1 su1 zB Brockelschiefer (crumbly shales)

a Official classification currently used in the geological information system of Saxony-Anhalt.

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752740

described in different geological classification systems,and

2 Available at http://www.iso.org/iso/iso_catalogue/catalogue_tc/

catalogue_detail.htm?csnumber=39890. ISO 19119:2005 is based on

Percivall (2002), which is publicly available.

an example from the hydrology domain that demon-strates the benefits for GI discovery, retrieval andexchange in a service composition application.

The remainder of the paper is structured as follows.Section 2 elaborates on the problems caused by semanticheterogeneity at the metadata, schema and data contentlevels. In Section 3, we explain the building blocksemployed in the proposed approach for overcoming theseproblems. The method for dealing with semantic hetero-geneity at the data level is described in Section 4. Section5 shows how the building blocks can be used forovercoming semantic heterogeneity at the metadata andschema levels. In both Sections 4 and 5, we show how thepresented method can be encapsulated in services andclient applications and how to combine these withexisting SDI components. The practical use is illustratedwithin the scope of the geology and hydrology examples.In Section 6, we discuss the presented approach in thecontext of related work. Section 7 concludes with anoutlook to future work.

2. Problems caused by semantic heterogeneity

In this section, we illustrate the problems caused bysemantic heterogeneity in two different geospatial appli-cations. We also use these examples throughout the paperto illustrate our proposed solution.

2.1. Interpretation and integration

In our first scenario, Hannah, a geologist, has to answerquestions concerning the stratigraphy within Saxony-Anhalt. Stratigraphy describes the layering and thecorresponding age of the rocks. Hannah’s questions mightinclude the following: ‘‘Where are the geological condi-tions suitable for hosting bodies of groundwater?’’ or‘‘Where is the geological rock suitable for a dump site?’’For this task, Hannah has to analyse and visualise severalgeological data sets that are available at the GeologicalSurvey of Saxony-Anhalt.

The main challenge for answering her question lies inthe interpretation of the data, i.e. at the data content level.Different authors of geological maps have used different

stratigraphic classifications at different times in history,leading to several synonymous and homonymous strati-graphic terms within the geological database. Often, evenon adjacent maps, different classification systems andnomenclatures are used. Table 1 gives a few examples forthe geological period of the Lower Buntsandstein (Triassic,about 250 Mio years ago). It illustrates the use of differentterms (synonyms) for the same rock formation, e.g. su2u,su3’k and su4 for Oolithic limestone, as well as the use ofthe same term (a homonym) for different rock formations,e.g. su2 for fine-grained carbonate sandstone as well asred-brown schistous clay.

In a current SDI, Hannah can use a Web Map Service(WMS, de la Beaujardiere, 2006) to represent or highlightdata based on different classification systems in acommon map. In order to provide an integrated viewusing a common classification system and symbology,Hannah has to formulate a Styled Layer Descriptor (SLD,Lalonde, 2002). For this task, she has to understand eachof the specific classification systems used for the data.Only if Hannah can interpret and compare the terms, shewill be able to manually translate (or re-interpret) thedata in order to formulate the SLD. The metadata for datasources available in SDIs today often do not providesufficient information on the classification systems usedthus making this task very difficult.

2.2. Discovery, retrieval and exchange in service composition

In our second scenario, Max, a service developer, wantsto implement a web service chain (as defined in ISO19119:20052) that provides fast and up-to-date access towater level measurements in a river, interpolates thesemeasurements along the river course, and visualisesthe interpolation results. Such a service chain could, forexample, enable the detection of hazard areas during floodevents. In order to execute the interpolation service in anopen and distributed environment, the following steps arenecessary: (1) appropriate input data have to be dis-covered, (2) the input data have to be retrieved using agiven query filter and (3) the retrieved data have to betransformed to fit the requirements of the interpolationservice.

Page 3: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

WFS 1:

<Filter> <PropertyIsGreaterThan>

<PropertyName>STAV</PropertyName> <Literal>200</Literal> </PropertyIsLessThan></Filter>

WFS 2:

<Filter> <PropertyIsGreaterThan>

<PropertName>HEIGHT</PropertyName> <Literal>2</Literal> </PropertyIsLessThan></Filter>

Fig. 1. Two filter expressions to retrieve all measurements with a water level greater than 2 m.

4 The term shared vocabulary should not be confused with thesauri

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752 741

In current SDI architectures, Max will face problemsduring the discovery, retrieval and exchange of geospatialdata, i.e. on the metadata, the schema and the datacontent level. For discovery, he will use a catalogue(Nebert et al., 2007) to do a keyword-based search,possibly in combination with a spatial filter. Even thoughnatural language-processing techniques can increase thesemantic relevance of search results with respect to thesearch request (e.g. Richardson and Smeaton, 1995),keyword-based techniques are inherently restricted bythe ambiguities of natural language. If Max’s terminologydiffers from the terminology used by data providers,keyword-based search can have low recall, i.e. not allrelevant information sources are discovered. Moreover,precision can also be low, i.e. some of the discoveredservices are not relevant (Bernstein and Klein, 2002). Thiscan be the case if requesters and providers use homon-ymous terms or because the catalogue does not allow therequester to express complex queries.

Once Max has discovered an appropriate data source,he can access it through a standardized interface like aWeb Feature Service (WFS, Vretanos, 2005), whichsupports retrieval of features encoded using the Geo-graphic Markup Language (GML, Portele, 2007). Toformulate a retrieval request to the WFS (including filterconditions), Max has to know and understand the schemaof the data source. While the service can return thestructure of the schema, the meaning of some of theproperty names might not be intuitively interpretable forMax. For example, if he wants to retrieve all measure-ments with a water level greater than 2 m, he has to knowthe property containing the water level (which might e.g.be called height, level or stav3) and the unit of measure thedata is given in (which might e.g. be centimetre, metre orfeet). Depending on the feature type schema, the samerequest would have to be stated quite differently fordifferent WFS, as shown with the two possible filterstatements in Fig. 1.

Furthermore, when the retrieved data are to beconsumed by another service (e.g. in a composite servicechain) they might have to be mapped from the providingservice’s (source) schema into the consuming service’s(target) schema. In Max’s service chain, the resultsreturned by the WFS may be incorrectly interpreted bythe consuming interpolation service. If, e.g., the interpola-tion service expects water level measurements in metresand the WFS provides water level measurements incentimeters, this will lead to wrong interpolation results.Therefore, both data processing and data integration

3 ‘‘stav vody’’ is the Czech term for ‘‘water level’’.

within a composite service chain require the detectionand elimination of heterogeneity at the data content level,e.g. by transforming values between different units ofmeasure.

3. Building blocks for overcoming semanticheterogeneity

In this section, we introduce the building blocksneeded in our approach to overcome semantic hetero-geneity in geospatial applications. We describe theontology architecture (Section 3.1), ontology language(Section 3.2) and reasoning procedures (Section 3.3) thatare employed in the proposed method. The notion ofregistration mappings (Section 3.4) is used to establish alink between a data schema and its semantic description,which is crucial for the tasks of data retrieval and schematransformation. The rule-based method for semantic dataintegration (Section 3.5) is employed in our approach fordetecting and eliminating semantic heterogeneity for thetask of data exchange.

3.1. Ontology architecture

Ontologies can be employed for making the semanticsof the information content of geospatial web servicesexplicit. They are constituted by a specific vocabulary usedto describe a certain reality, plus a set of explicitassumptions regarding the intended meaning of thevocabulary words (Guarino, 1998).

The backbone of our method is an infrastructure basedon a hybrid ontology approach (Wache et al., 1999), whichis a combination of two existing ontology approaches.The main idea is to describe each information systemswith its own application ontology, as it is also done inmultiple ontology approaches (Mena et al., 2000). How-ever, in contrast to these, in the hybrid ontology approach,the concepts of each application ontology do not stand ontheir own, but are instead based on primitive conceptsfrom a common shared vocabulary of the domain.4 Thus,comparability between the different application ontolo-gies is achieved on the semantic level. A user searching fordata or a category with certain properties can also use theconcepts and relations from the shared vocabulary tospecify a query. As both application ontologies andqueries are based on the same concepts, they become

or lexical structures that offer simple term collections. In our approach,

we use the term to comprise the collection of domain ontologies used in

an information community (Fig. 2).

Page 4: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

INFORMATION COMMUNITY

datasource

shared vocabulary

applicationontology

applicationontology

query

domain ontology domain ontology

user

specifies

...

...

datasource

applicationontology

provides basic concepts and relations for specifying

classifi-cationsystem

are used for semantic annotations of

domain ontology

Fig. 2. Hybrid ontology approach, figure adapted from Wache et al.

(2001).

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752742

comparable—and thus the commitment of providers andrequesters to a common shared vocabulary ensuressemantic interoperability. Thus, our hybrid ontologyapproach offers the comparability of single ontologyapproaches (Arens et al., 1996) as well as the flexibilityof multiple ontology approaches (Fig. 2).

3.2. Description logics

The ontologies shown in this paper are expressed usinga Description Logic (DL) (Baader and Nutt, 2003) notationused in the RACER system (Haarslev and Moller, 2004). DLis a family of knowledge representation languages that aresubsets of first-order logic (for a mapping from DL to FOL,see e.g. Sattler et al., 2003). DL provide the basis for theOntology Web Language (OWL), the proposed standardlanguage for the Semantic Web (Antoniou and VanHarmelen, 2003).

The basic syntactic building blocks of a DL are atomicconcepts (unary predicates), atomic roles (binary predi-cates) and individuals (constants). The expressive powerof DL languages is restricted to a small set of constructorsfor building complex concepts and roles. Implicit knowl-edge about concepts and individuals can be inferredautomatically using inference procedures (Baader andNutt, 2003).

A DL knowledge base consists of a TBox containingintensional knowledge (declarations that describe generalproperties of concepts) and an ABox containing exten-sional knowledge that is specific to the individuals of theuniverse of discourse. In our work, we only use TBoxlanguage features, namely

� concept definition:

(define-concept C D),

� concept inclusion:

(implies C D), and

� role definition:

(define-primitive-role R :parent P :domain C :range D).

The domain of a role is a concept describing the set of

all individuals from which this role can originate. Thisnotion of the term should not be confused with the notion‘‘domain of interest’’ (as in shared domain vocabularies).The range of a role is a concept describing the set of allthings the role can lead to. Concepts can be defined usingthe following constructors:

D-

*top* (universal concept)

*bottom*

(bottom concept)

(and E F)

(intersection)

(or E F)

(union)

(all R C)

(value restriction)

(some R C)

(existential quantification)

(at-least|at-most|exactly n R)

(number restrictions)

The universal concept describes the set of all indivi-duals in the universe of discourse. The bottom conceptdescribes the empty set.

3.3. Subsumption reasoning

Determining whether one description subsumes an-other one, i.e. whether the first is more general than thesecond, is one important reasoning task of DL systems.Formally, subsumption can be defined as follows: In aterminology T containing concepts C and D, C is subsumedby D if in every model of T the set denoted by C is a subsetof the set denoted by D (Donini, 2003).

With subsumption tests, the concepts of a terminologycan be organised into a hierarchy according to theirgenerality. A concept description can also be conceived asa query, describing a set of objects one is interested in(Donini, 2003). Thus, all concepts that are subsumed bythe query concept can be considered to also satisfy thequery. Users can apply this functionality for matchmaking,i.e. for discovering concepts that match their query.The query concept used in matchmaking can either bean existing concept from a domain or application ontology(simple query) or a concept defined by the user based onthe concepts and relations in the shared vocabulary(defined concept query).

For example, the concept su4 from an existingapplication ontology (representing a stratigraphic termin some geological classification system) could be used asa query concept in a simple query. Instead, to express aquery such as ‘‘areas suitable for hosting bodies ofgroundwater’’ a query concept might have to be defined(if no such concept already exists in an applicationontology). This new query concept (for a defined conceptquery) should be based on domain concepts such asconsistency or layering.

3.4. Registration mappings for GI retrieval

While subsumption reasoning allows expressive queryprocessing that helps to improve data discovery, tosupport users in formulating filters for data retrieval, anexplicit link is required between the data source’s schemaand its application ontology. This link will ensure thatthe schema elements can be interpreted in terms of the

Page 5: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

<?xml version="1.0" encoding="UTF-8"?><StavVody xmlns="http://www.chmi.cz" (…) >

<gml:position><gml:Point><gml:coordinates>859015.7375721685,5624676.8195826</gml:coordinates></gml:Point>

</gml:position><tok>LABE</tok><stav>151</stav><datum>2003-11-02T07:00:00</datum>

</StavVody>

htaPlautpecnoChtaPlarutcurtS

/StavVody ↔ wfs1_Measurement /StavVody/gml:position/gml:Point ↔ wfs1_Measurement.location /StavVody/tok ↔ wfs1_Measurement.quantityResult.observedWaterBody.name /StavVody/stav ↔ wfs1_Measurement.chmi_qRWaterLevel.value /StavVody/datum ↔ wfs1_Measurement.timeStamp

Fig. 3. Example registration mapping for the GML document shown on top.

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752 743

shared vocabulary (see Section 3.1). In order to establishthis link, we use registration mappings as introduced inBowers et al. (2004). The main idea of registrationmappings is to have separate descriptions of the applica-tion concept C and of the structural details of the featuretype it describes. This has the advantage that thesemantics of the feature type can be specified moreaccurately in application concepts because the specifica-tion does not try to mirror the feature type’s structure.This is especially true for feature types that do not wellreflect the conceptual model of the domain.

An example registration mapping5 for a feature typerepresenting water level measurements in rivers is shownin Fig. 3. The feature type is mapped to an applicationconcept (wfs1_Measurement) and its properties aremapped to a contextual path in the (application) ontology.A contextual path denotes a concept, possibly within thecontext of other concepts. It takes the form C.r1.r2. y .rn

for nX0, where C is a DL concept and r1 to rn are DLproperties. For example, the contextual path wfs1_Mea-

surement.quantityResult.observedWaterBody refers to thewater body, in which a (wfs1) measurement was taken.Note that, when using registration mappings, we alwaysassume that every property of a feature type can bemapped to a contextual path in the ontology.

3.5. Semantic mediation for GI service composition

The matchmaking approach described in Section 3.3can be used to identify data sources that exactly match thesemantics required by the requester. By introducing amechanism for data integration between the source andthe target schema, also more relaxed types of matchbecome possible.

5 Taken from the hydrology example described in Section 4.

As we want to transform data on-the-fly, we use a non-materialized data integration approach using a mediatorarchitecture (Wiederhold, 1992). In detail, our approach isbased on the idea of semantic mediation introduced inWache (2003). When compared with other approachesfrom the area of schema integration (for an overview, seeConrad 2002; Rahm and Bernstein, 2001), semanticmediation specifically focuses on semantic heterogeneity(Goh et al., 1999). When developing a mediator forintegrating heterogeneous information sources, the spe-cification of the integration mappings on the semanticlevel is considered to be the most crucial task.

As described in Section 3.1, we use applicationontologies to semantically describe an information source.Following Wache (2003), we further define a semanticdescription to consist of two parts: a description of themeaning and a set of context attributes. Analogously toWittgenstein (1953), Wache claims that meaning isdetermined by use in the language and a context of agiven situation, respectively. Thus, Wache defines themeaning on a meta-level unanimously across all datasources in a domain e.g. by a (meta-) concept WaterLevel.It can be used to automatically detect semanticallyequivalent information or semantic heterogeneity withinone domain. The context attributes describe the differentencodings of semantically equivalent information indifferent data sources, e.g. the unit or the scale ofWaterLevel. Thus, we are able to solve semantic hetero-geneity problems between specific data sources withinone domain and also simplify the process of identifyingsemantically equivalent information while consideringdifferent representations of data.

The integration mapping, which provides the basis forthe data exchange task, is specified using a rule-basedapproach. The mapping consists of context transformation

rules that specify how a piece of information can betransformed from one context into another. Table 2 shows

Page 6: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

Table 2Context transformation rule

Source information Target information Transformation operation

Attribute name ?att1 ?att2 ?v2 ¼ ?v1/100

Meaning ?m ?m

Context UnitOfMeasure ¼ Centimeter UnitOfMeasure ¼ Meter

Value ?v1 ?v2

Variables are denoted by a prefixed question mark to distinguish them from constants.

6 We use the concepts SandSize, SiltSize, etc. to refer to grain

size classes (e.g. 0.063–2 mm for SandSize) that characterize material

of that grain size (i.e. sand, silt, etc.).

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752744

schematically a simple example of a context transforma-tion rule. The rule specifies how to transform a measure-ment (i.e. a piece of information) in metres (sourcecontext) into a measurement in centimeters (targetcontext). More precisely, it states that when moving fromthe context unitOfMeasure ¼ Centimeter to the contextunitOfMeasure ¼ Meter, the value ?v2 of the attribute ?att2

can be derived by dividing the value ?v1 of attribute ?att1

by 100 provided the meaning ?m of both attributes is thesame.

4. Enhancing GI interpretation and integration

Often, values in geographic datasets are encoded usingsome kind of classification system. Examples for suchvalues include soil types, land cover types and geologicalages or formations. A class or type in such a classificationsystem can usually be described by a number of definingcharacteristics. We aim at overcoming some of thesemantic heterogeneity on the data level by making themeaning of terms from different classification systemsexplicit. This should ease the interpretation of data valuesfrom unknown data sources as well as the integration ofseveral data sources that use different classificationsystems.

We use the building blocks presented in Section 3 tomake the meaning of terms used in classification systemsexplicit and comparable. More precisely, we use DLconcepts to represent the classification system termsand subsumption reasoning to establish whether a term inone classification is a specialisation of a term in anotherclassification or to find terms in different classificationsystems that are all specialisations of a user query.

In Section 4.1, we illustrate this method using thegeology example introduced in Section 2. In Section 4.2,we present an SDI implementation for this example.

4.1. Enhancing GI interpretation and integration in the

geology example

In our scenario, Hannah has to analyse data sets basedon different stratigraphic classification systems that areunknown to her. Also, she needs to find areas with thesame petrographic characteristics in several adjacentgeological maps that use different classification systems.

The basis for enhancing GI interpretation and integra-tion in this scenario is a common shared vocabulary for asub-domain of geology. The domain ontology describesthe petrographic properties of the stratigraphic layers,e.g. the consistency of the rock, its major and minor

petrographic components, and their grain size6 andlayering. Fig. 4 gives a simplified overview of the conceptsand properties in the domain ontology.

Based on the domain ontology, more specific applica-tion ontologies are defined for each classification system.For example, the stratigraphic terms su1 and su3 (from theClassification of Jung, cf. Table 1) both describe rock typeswith hard consistency and silt as their main component;su1 further contains sand and lime (with a particularlayering) as other components, while su3 does not haveany other components. The application concepts for thesedescriptions, i.e. its formalisation based on the domainvocabulary, is given below.

(define-concept su1 (and

ClasticSediment

(all hasConsistency Solid)

(exactly 1 hasMajorComponent)

(all hasMajorComponent

(all hasGrainSize SiltSize))

(exactly 2 hasMinorComponent)

(exactly 1 hasMinorComponent

(all hasGrainSize SandSize))

(exactly 1 hasMinorComponent (and

Lime

(exactly 1 isLayered)

(all isLayered Banks)))))

(define-concept su3 (and

ClasticSediment

(all hasConsistency Solid)

(exactly 1 hasMajorComponent)

(all hasMajorComponent

(all hasGrainSize SiltSize))

(exactly 0 hasMinorComponent)))

As all descriptions from different classification systemsare grounded in the common shared vocabulary, theybecome comparable, and Hannah can use subsumptionreasoning to find concepts matching a specific queryconcept. She can do her search either as a simple query oras a defined concept query (Section 3.3). For example, if shewants to highlight areas in a map that are classified as su4

(according to the Classification of Jung), she can do asimple query for this concept. If she is interested in ‘‘areassuitable for dump sites’’, she can define a new queryconcept that defines the characteristics of a rock offering agood protection against ground water pollution. Trans-lated in petrographic characteristics this means it must be

Page 7: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

Client WorkflowService

(WS Client)

Web FeatureService (WFS)

Web MapService(WMS)

Ontology-basedReasoner

(OBR)

1

4 request map(based on SLD)

3

5

2

define query concept

find matching conceptsbased on subsumption

reasoning

requestfeatures

create SLD

Fig. 5. SDI architecture for GI interpretation and integration.

ClasticSediment

Component

Consistency

GrainSize

SandSizeClaySize SiltSizeGravelSize

Layering

Genesis

Carbonate Silicate Humus

hasGrainSize

hasMajorComponenthasMinorComponent

hasConsistency

isLayered

hasGenesis

1..3 0..*

Banks Ooide

Aeolic Glacial ...

...

...

Concept

Property

1..3 Cardinality

Generalisation

Solid Granular

Fig. 4. Concepts and relations of the domain ontology. For simplicity, only some of the concepts and relations are shown.

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752 745

a solid rock or a soil based on clay or silt. This can betranslated into the following DL query concept:

(define-concept query (and

KlasticSediment

(or

(and

(all hasConsistency Solid)

(exactly 1 hasMajorComponent))

(and

(all hasConsistency Granular)

(exactly 1 hasMajorComponent)

(all hasMajorComponent

(all hasGrainsize (or ClaySize SiltSize)))))))

This query concept can be used in a DL reasoner to findmore specific terms in each of the application ontologiesthat describe stratigraphic terms in the different classifi-cation systems. In our scenario, subsumption reasoningreturns (among others) the concepts su1 and su3 fromthe application ontology representing the classificationsystem by Jung (see definitions above). Hannah can usethese terms to create the SLD for selection and visualisa-tion of the data sets described using this classificationsystem. Of course, she can also repeat this procedure forfurther application ontologies associated with otherclassification systems.

Page 8: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752746

4.2. SDI implementation

In order to use the methods for enhancing GI inter-pretation and integration in SDIs, they have to beencapsulated in software components. In this section,we introduce an architecture that includes such compo-nents in addition to existing SDI components and describethe flow of information between them. Fig. 5 depicts thearchitecture including the tasks fulfilled by each of thecomponents.

The central component of the architecture is a clientthat manages the overall workflow (WS Client). In ourcurrent architecture, this service is tailored to its specificapplication, i.e. to highlight areas in adjacent geologicalmaps given a specific user query. In a future version of thearchitecture, this client could be substituted by a genericworkflow service that executes a customised descriptionof the service chain to be executed, e.g. using the BusinessProcess Execution Language (BPEL) (Andrews et al., 2003).

The user interface provided by the WS Client is theentry point of the application (step 1 in Fig. 5). The usercan either choose a query concept from an existingapplication ontology or define one’s own query conceptbased on the domain ontology. In order to make the use ofdomain ontologies transparent for the user, we haveimplemented a function for translating user queries intoDL query concepts, which can subsequently be used by aninference engine to do the actual matchmaking.

The query concept is sent to the Ontology-Based

Reasoner (OBR), a component that stores the registeredontologies and provides the reasoning functionality (step2). Based on subsumption reasoning the OBR discoverssemantically matching terms in the different classificationsystems and returns them to the WS Client. In the nextstep, these concepts are used to generate an SLD includinga GetFeature request to the WFS that provides a standar-dized access to the geological database (step 3). This SLDis included in a request to a WMS (step 4), and theretrieved features (step 5) are displayed in a map. As aresult of the user-defined query, Fig. 6 shows the

Sectisu

Sectisu

Fig. 6. Result for rock types offering a good pr

successful data interpretation and integration in termsof a combined map from two neighbouring map sheets inSaxony-Anhalt. Each map sheet is based on differentclassification system.

5. Enhancing GI discovery, retrieval and exchange

Before being able to interpret and/or integrate datasources as described in Section 4, they need to be discoveredand retrieved. In a service chain, data also has to beexchanged between connected component services. Thesetasks are often hindered by semantic heterogeneity on themetadata and schema levels (Section 2). Also on these levels,the building blocks introduced in Section 3 can be used forenhancing the discovery, retrieval and exchange of geo-graphic information.

In Lutz and Klien (2006), we have proposed an integratedapproach for GI discovery and retrieval based on a specificuser query. In this paper, we adapt and extend this approachto situations where the retrieved data are to be consumedby another service (e.g. in a composite service chain). Inthese cases, the user query is replaced by the requirementsof the consuming service. Furthermore, an additional stepmight be required after discovering and retrieving appro-priate data. If the structure and semantics of the dataprovided by one service do not match exactly those requiredby the consuming service, a transformation becomesnecessary. In general, transformations connect one or moredata sources to a destination with the help of appropriateconversion rules. The main challenge here is not to processthe transformation, but rather to discover and to specify it.

A prerequisite for the presented approach is theavailability of shared vocabularies for the domain ofinterest. The available information sources have to besemantically annotated with application ontologies (Sec-tion 3.1) that use registration mappings (Section 3.4), andthe input of the consuming service also needs to bedescribed through a DL concept and a registrationmapping. The DL concept describing the service input isused as a query concept for discovering semantically

on of topographic rvey map GK 25: 4335 Hettstedt

Date: 1962

Author: JUNG

on of topographic rvey map GK 25: 4435 Eisleben

Date: 1929

Authors: FULDA & HUELSEMANN

otection against ground water pollution.

Page 9: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752 747

appropriate feature types. The query concept mightalready take into account that a transformation is possiblefor certain characteristics and hence relax some of therestrictions used in the approach presented in Lutz andKlien (2006). The matchmaking between the queryconcept and the application concepts describing featuretypes is based on subsumption reasoning (Section 3.3).

After the requester has selected one of the discoveredfeature types, a query, which uses the property names ofthe feature type’s application schema, is constructed fromthe user query. This step requires registration mappings(Section 3.4) between the feature type’s properties andthe roles from the domain ontology. Step by stepinstructions for deriving the request are given in Lutzand Klien (2006). The derived query is then executed andits results are used as input for the consuming service.

In most cases, the retrieved data will not fit therequirements of the consuming service directly, and atransformation is necessary. For that purpose, we use theapproach of semantic mediation introduced in Section 3.5.In detail, we analyse the correspondences between theproperties of the feature types discovered in the previousstep. These correspondences are established by referringto the properties’ contextual characteristics. After detect-ing contextual heterogeneity, at least one context trans-formation rule has to be acquired to solve this drawback.Ideally, context transformation rules given by a functionlibrary will be used.

In Section 5.1, we illustrate this approach using thehydrology example introduced in Section 2. In Section 5.2,we present an SDI implementation for this example.

5.1. Enhancing GI discovery, retrieval and exchange in the

hydrology example

In our example, Max is interested in water levelmeasurements for the Elbe River for a given date (22April 2004) and location, which he wants to use as inputfor an interpolation service. In the following, we illustratehow to build a query statement based on Max’s require-ments (1), how to turn this query into a DL query concept(2) and subsequently into a GetFeature request (3), andfinally how to derive the transformation rules requiredin order to use the data as input for the interpolationservice (4).

(1)

SE

Fig.synt

Fig. 7 presents a query statement reflecting Max’srequirements. The query follows the syntax for anSQL-like query language proposed in Lutz and Klien(2006), which is to enable users to intuitively selectproperties of specific feature types, possibly using

LECT quantityResult FROM Measurement WHEREquantityResult hasType ( observable hasType WaterLevel AND unitOfMeasure hasType Meter AND

observedWaterBody hasType (name = “Elbe”)) AND dateStamp = 2004-04-22 ANDhasLocation isWithinBoundingBox (11,52,13,54)

7. Example for a semantic query statement. Keywords of proposed

ax are shown in capitals, comparators in italics.

one or several constraints. Properties correspond torelations in the shared vocabulary, while feature typescorrespond to concepts. The constraints are eithertype restrictions (e.g. observable hasType Water-

Level) or value constraints, i.e. comparisons with avalue specified by the requester (e.g. dateStamp ¼2004-04-22). The query statement also includes arequirement of the interpolation service, whichexpects values to be given in metres (unitOfMeasurehasType Meter).

(2)

This query can be translated following the guidelinesin Lutz and Klien (2006) into the following DL queryconcept. Type constraints are expressed throughuniversally quantified value restrictions in DL. Valueconstraints only become relevant when the data areretrieved from the WFS. In order to be able to expressthese constraints as a filter expression in the GetFea-ture query, it is important that the feature typecontains the property to be constrained. Therefore,in the discovery phase, value constraints are ex-pressed as existential quantification on the specifiedroles. In this step, possible transformations couldalready be taken into account. If, for example, atransformation rule between the units metre andcentimetre exists, it makes sense to search for featuretypes that offer water level measurements in bothunits, thus extending the potential result set. In orderto relax the query concept accordingly, the rangerestriction for the unitOfMeasure role is relaxedfrom Meter to the disjunction of unit concepts thatare transformable into centimeters: (or CentimeterMeter Inch y)). Note that now it is no longerguaranteed that the discovered feature types exactlymatch the requirements of the interpolation service.However, it is guaranteed that all feature types can betransformed in the correct schema using a transfor-mation rule.

(define-concept query (and

Measurement

(some quantityResult

9>>>>>>=>>>>>>;

(all observable WaterLevel) type

constraints

(all unitOfMeasure (or Centimeter

Meter Inch y))

(some observedWaterBody

9>>>>=>>>>;

value

constraints

(some name *top*)))

(some dateStamp *top*)

(some hasLocation *top*)

)

This query concept is then used for discovering appro-priate data sources based on DL subsumption reasoning(Section 3.3).

(3)

In the next step Max wants to retrieve the requestedinformation from the discovered data sources. As weassume an SDI setting, this means formulating aGetFeature request including a filter expression to theWFS serving the data. In order to do this, the structureof the WFS’s feature type and the names of itsattributes have to be known. All the requiredinformation can be accessed from the feature type’s
Page 10: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

<GetFeature service="WFS" version="1.0.0" outputFormat="GML2" (…) ><Query typeName="StavVody">

<PropertyName>stav</PropertyName><Filter>

<And><PropertyIsEqualTo>

<PropertyName>StavVody/tok</PropertyName><Literal>Elbe</Literal>

</PropertyIsEqualTo><PropertyIsEqualTo>

<PropertyName>StavVody/datum</PropertyName><Literal>2004-04-22</Literal>

</PropertyIsEqualTo><Within>

<PropertyName>gml:position</PropertyName><gml:Box srsName="http://www.opengis.net/gml/srs/epsg.xml#4326”>

<gml:coordinates>11.0,52.0 13.0,54.0</gml:coordinates></gml:Box>

</Within></And>

</Filter></Query>

</GetFeature>

Fig. 8. WFS query that requests the property stav from a feature type called StavVody. Filter expression constrains the query to features whose tok

property equals ‘‘Elbe’’, whose datum property equals ‘‘2004-04-22’’ and whose position property is within specified bounding box.

Client WorkflowService

(WS Client)

SemanticTranslation

SpecificationService (STSS)

TransformationService

(TS)

InterpolationService(WLIS)

Web MapService(WMS)

Ontology-basedReasoner

(OBR)

find transformation rules

executetransformation

rules

6execute interpolation7

display results

8find matchingconcepts

3

5

get availabletransformations

2

4generate

query conceptgenerate WFS

GetFeature request

1

Web FeatureService(WFS)

requestfeatures

9

Fig. 9. (Simplified) SDI architecture for GI discovery, retrieval and exchange.

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752748

registration mapping. This step is simplified by theassumption that there always is a simple mappingbetween a contextual path in the ontology and afeature type property (see Section 3.4). By using theregistration mapping shown in Fig. 3 (Section 3.4), theexample query statement shown in Fig. 7 can betranslated into the WFS GetFeature request and filterexpression shown in Fig. 8.

(4)

As the data retrieved from the WFS state the waterlevel in centimeters, Max cannot use it directly asinput for the interpolation service, which expectsmeasurement values in metres. Hence, a transforma-

tion rule has to be derived and executed beforesending the data to the interpolation service. This isa simple transformation from centimetre into metre,derived from the contextual information given by thesemantic description within the feature type’s regis-tration mapping.

5.2. SDI implementation

We have also designed an SDI architecture forimplementing the methods for enhancing GI discovery,

Page 11: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752 749

retrieval and exchange (Fig. 9). Apart from standard SDIcomponents, the architecture contains an OBR component(see Section 4.2) for subsumption reasoning, a Semantic

Translation Specification Service (STSS) for deriving atransformation and a Transformation Service (TS) forexecuting it.

The central component and entry point for theapplication is a client (WS Client) that also controlsthe workflow. As in the previous example, the control ofthe workflow could alternatively be managed by a genericworkflow service. The client provides a user interface forquery formulation that enables users to pose formal andprecise queries based on existing domain ontologies.

After a user query has been submitted, it is translatedinto a DL query concept (step 1 in Fig. 9). Whensubmitting a query, the user might choose whether thetransformation rules offered by the STSS (see below)should be taken into account (step 2). In our scenario, theSTSS offers unit transformations for length measurements(e.g. from centimeters to metres) and the query concept isrelaxed accordingly. The query concept is then sent to theOBR, which discovers semantically appropriate featuretypes and returns their metadata to the WS Client (step 3).In order to access the discovered feature type through itsWFS interface, the WS Client then automatically constructsa GetFeature request from the user’s query using theproperty names of the feature type’s application schema(step 4).

For the discovery and specification of the transforma-tion between a source and target schema the STSS hasbeen developed. Its input parameters include the resultsof the ontology-based discovery and retrieval, i.e. theapplication ontologies, registration mappings and GetFea-ture requests of the source services (the discovered WFSs)and the corresponding information for the target service(the interpolation service). The STSS determines thesemantic correlations, selects the needed context trans-formation rules, adds changes to the structure andspecifies a transformation (step 5). The transformationrules are expressed in XSLT (Clark, 1999), a formalism forspecifying transformations of XML documents. The actualtransformation of the data provided by the WFSs (step 6)is performed by the Transformation Service, which simplyexecutes the XSLT rules and returns a GML document withthe transformed water level measurements in metres(step 7).

Now, the Interpolation Service correctly interprets allthe data provided by the WFSs and the results of theinterpolation (step 8) can be displayed in a WMS (step 9).

6. Discussion and related work

The approach presented in this paper is related toprevious work in the fields of geographic informationscience, information discovery and retrieval, data integra-tion and artificial intelligence.

A first step towards overcoming semantic heterogene-ity in the geospatial domain has been the proposal ofIntegrated Geographic Information Systems (IGIS), i.e.systems that integrate diverse GIS technologies or reflect

a particular point of view of a community (Hinton, 1996).This idea has been advanced in Fonseca et al. (2002a, b) byintroducing ontologies as means for supporting represen-tations of incomplete information, multiple representa-tions of geographical space, and different levels of detail.In SDIs, where geographic information is usually highlydistributed and heterogeneous, solving heterogeneityproblems becomes a prerequisite. One focus of theresearch presented here is to transfer the ontologyapproach for dealing with semantic heterogeneity to theSDI domain and to demonstrate how it can be integratedinto existing standards-based architectures.

Work in the field of information discovery and retrievalis manifold. There is widespread agreement amongresearchers in this field that declarative content descrip-tions and query capabilities are necessary (Czerwinskiet al., 1999; Guarino et al., 1999; Heflin and Hendler, 2000;Mena et al., 1998). The vision of most research in thisdomain is that users should be able to express what theywant, and the system should find the relevant sources andobtain the answer (Levy et al., 1996). As this might involvecombining data from multiple sources, information dis-covery and retrieval is closely related to data integration,whose goal it is to provide a uniform interface (through aglobal schema) to a multitude of data sources (each with alocal schema). In data integration terminology (Levy,2000), our approach can be considered as a ‘‘Local AsView’’ approach. This means that the contents of a datasource are described as a query over the mediated schema,which in our case is substituted by the ontology (see Guhaet al., 2003; Madche et al., 2001 for other examples, whereontologies are used in search and retrieval mechanisms).Usually, a query through the mediated schema in thisapproach requires complex transformation rules. With hissemantic mediator, Wache (2003) suggests a way togenerate these rules from fully annotated data sourcessemi-automatically with the help of assistants. Theassistants attempt to find inter-correspondences betweendata elements of query and sources. In our SDI-scenario, itis often not possible to ask the user for confirmation,especially not on low-level relations between sources theuser is not familiar with. We circumvent this problemwith the specialisation to a specific domain with explicitlypre-modelled information and relations, e.g., transforma-tion rules between units (see Section 3.5).

In the BUSTER (Bremen University Semantic Translator

for Enhanced Retrieval) project (Vogele et al., 2003), DLdescriptions have been used to describe and queryclassifications (Visser and Stuckenschmidt, 2002) anddata content (Hubner et al., 2004; Vogele and Spittel,2004). However, these approaches use simple ontologies,and queries only have limited expressivity. They showhow well established catalogue systems for electronicdevices, namely ETIM and ecl@ss (Visser et al., 2002a), orland use classification, ATKIS and CORINE Land cover (Visseret al., 2002b), can be used as the grounding sharedvocabularies for semantic translation. However, as theseclassification systems are often imprecise, miss details andcontain hardly understandable verbal circumscriptions andeven inconsistencies, a lot of adjustments were needed inorder to transform them into ontological descriptions.

Page 12: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752750

The idea of primarily using roles for building ontolo-gies and for identifying taxonomic relationships betweenquery and application concepts is closely related tofeature-based and geometric approaches to ontologyintegration (see Goldstone and Son, 2005 for an overviewand Raubal, 2004; Rodrıguez and Egenhofer, 2004 forapplications in the geospatial domain). As these ap-proaches usually compute a numeric similarity valuebetween two concepts they can express gradual differ-ences between them. Conversely, our approach considersonly subsumption relationships, which allow no gradualdifferentiation. This is important for scenarios, wherethe discovered data has to have certain properties (e.g.because these are required for further processing). If, incontrast, the properties are only used to describe thecharacteristics of a certain feature type, feature-based orgeometric similarity measures could also be used in aretrieval scenario.

Providing an ontology-based query interface thatenables uniform access to heterogeneous data sourcesand supports the user in formulating a precise query hasalso been proposed by the SEWASIE7 project (Dongilliet al., 2004), which employs the same ontology andmatchmaking approach for information retrieval. More-over, the SEWASIE query interface enables an iterativerefinement process of the query and utilises naturallanguage as query representation. While this certainlyrepresents a user-friendly approach, it also additionallyrequires that the ontology engineer provides verbalisa-tions for each ontology term. In contrast, we propose anintuitive but still formal query language. And whereas theSEWASIE query interface is developed for the needs of theSemantic Web in general, we are focused on geospatialinformation infrastructures.

A strategy based on semantics for supporting thediscovery and integration of datasets and services is usedin the Science Environment for Ecological Knowledge(SEEK) project (Pennington et al., 2004). We have benefitedfrom the work conducted in SEEK by adapting the methodof registration mapping (Bowers et al., 2004) for ourpurposes. The key difference of our approach lies in thecombination of both tasks, the information discovery andretrieval, and to hide the complexity from the user.

Integrating information from different user commu-nities based on ontologies with the goal of displaying asingle map (with a single well-understood legend) hasalso been the focus of other research projects. In the GEON

project, geological data from different US states have beencombined according to a simple shared vocabularycomprising geological age, composition, fabrics, textureand genesis (Lin and Ludascher, 2003). Rather than using ahybrid ontology approach as proposed in this paper, theauthors propose to use explicit mappings betweendifferent ontologies. In the HarmonISA project,8 land usedata from the border region of Austria, Slovenia and Italyhave been combined in a single land cover map. In this

7 Semantic Webs and Agents in Integrated Economies, see http://

www.sewasie.org/.8 See http://harmonisa.uni-klu.ac.at.

project, a comprehensive shared vocabulary for definingland use classes has been developed based on the differentnational land use classification systems. In contrast to theapproach presented in this paper, where all areas match-ing certain requirements were searched, the HarmonISA

project aimed at producing a land use map that com-pletely covers the area. Therefore, the authors used acomplex similarity measurement between land use typedefinitions rather than subsumption reasoning.

7. Conclusions and future work

Problems caused by semantic heterogeneity can occuron different levels in SDIs. In this paper, we haveillustrated how these problems can be overcome by usingontologies and reasoning. We have also shown how theproposed method can be encapsulated in services andclients and how these can be combined with existing SDIcomponents. The two use cases illustrate how theseintelligent services effectively support the semantic query,retrieval, exchange and integration of geographic data.Moreover, we have shown how to also support dynamicservice chaining in order to answer complex queries.

The benefits of the presented approaches are manifold:Using the method for GI interpretation and integrationpresented in Section 4, users can either use terms from aclassification system they are already familiar with andstill find features using a different (unknown) classifica-tion system. Furthermore, they can build their own queryconcept based on petrographic characteristics, available inthe shared vocabulary, and find the appropriate featuresirrespective of their classification. Such query formulationcan significantly support decision-making based ongeological information, for example the building of adump site. The method for GI discovery, retrieval andintegration presented in Section 5 also enables theformulation of a query and the interpretation of the resultusing the same well-known shared vocabulary. Further-more, it hides from the user the complexity of logicstatements that are needed for automated semanticmatchmaking. Finally, it combines the discovery andretrieval tasks by deriving both, the DL concept and therequest to retrieve the data from the same user query.The integration of semantic mediation into the workflowis an add-on that allows relaxing the DL conceptformalisation if transformation rules are available for thespecific context. This potentially increases the number ofrelevant results.

In our future work we will address the following issues:

Extensions of the tested scenarios. The tested scenariocomprises requests and application schemas with arelatively simple structure. Also, the effects of the scale

of a data source have not been taken into account.Future tests of the approach will include more complexrequest possibilities (like support for spatial compara-tors and nested queries) and data sources at differentscales. Also, the effectiveness of the approach willbe tested in a more generic setting with complexapplication schemas and examples from other

Page 13: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752 751

domains. Finally, we will investigate examples whereno simple mapping between a feature type propertyand a contextual path in the ontology exists.

� Semantics of geoprocessing services. In scenarios where a

service chain is required to answer a complex question,the semantics not only of the data but also of theservices for processing the data are of vital importance.In our future work, we will therefore investigateapproaches for the semantic description and discoveryof geoprocessing services and examine how these canbe combined with the presented approach for thediscovery and retrieval of geographic data. For firststeps in this direction, see Lutz (2005a, b).

� Template service chains. In recent years, many research-

ers have addressed the automated generation ofcomplex service chains based on user queries (e.g.Burstein et al., 2005). However, these approaches stillface many problems of complexity. A simpler solutionfor supporting the creation of complex service chainsby the user could be based on providing generictemplates for service chains that solve a particulartype of task. Such a template should be a fixedcombination of several generic service types, each ofwhich performs a subtask of the overall functionality.In an iterative process, requesters could subsequentlyinstantiate these templates with services discoveredfor each of these subtasks.

� User-friendly generation of application ontologies. While

our approach hides much of the complexity of theontology-based GI retrieval from the requester, thedata provider still has to create and register rathercomplex application ontologies. We are aware that thisis one of the crucial bottlenecks for our approach to beaccepted and used in future SDIs. Future work willtherefore address how the process of creating formaldescriptions of the geodata could be automated. Firstideas on how this can be achieved using spatialanalyses of geographic datasets are presented in Klienand Lutz (2005).

References

Andrews, T., Curbera, F., Dholakia, H., Goland, Y., Klein, J., Leymann, F., Liu,K., Roller, D., Smith, D., Thatte, S., Trickovic, I., Weerawarana, S., 2003.Business Process Execution Language for Web Services, Version 1.1.BEA Systems, IBM, Microsoft, SAP, Siebel Systems.

Antoniou, G., Van Harmelen, F., 2003. Web Ontology Language: OWL. In:Staab, S., Studer, R. (Eds.), Handbook on Ontologies. Springer,Heidelberg, pp. 67–92.

Arens, Y., Hsu, C.-N., Knoblock, C.A., 1996. Query processing in the SIMSinformation mediator. In: Huhns, M.N., Singh, M.P. (Eds.), Readings inAgents. Morgan Kaufmann, San Francisco, CA, pp. 82–90.

Baader, F., Nutt, W., 2003. Basic description logics. In: Baader, F.,Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P. (Eds.),The Description Logic Handbook. Theory, Implementation andApplications. Cambridge University Press, Cambridge, pp. 43–95.

de la Beaujardiere, J. (Ed.), 2006. OpenGIS Web Map Server Implementa-tion Specification (version 1.3.0). OGC Document no. 06-042, March.Available at /http://www.opengeospatial.org/standards/wmsS.

Bernstein, A., Klein, M., 2002. Towards high-precision service retrieval.In: Proceedings of the First International Semantic Web Conference(ISWC 2002), Sardinia, Italy, pp. 84–101.

Bishr, Y., 1998. Overcoming the semantic and other barriers to GISinteroperability. International Journal of Geographical InformationScience 12, 299–314.

Bowers, S., Lin, K., Ludascher, B., 2004. On integrating scientific resourcesthrough semantic registration. In: Proceedings of the 16th Interna-tional Conference on Scientific and Statistical Database Management(SSDBM’04), Santorini Island, Greece, pp. 349–352.

Burstein, M., Bussler, C., Pistore, M., Roman, D. (Eds.), 2005. Proceedingsof the Workshop on WWW Service Composition with Semantic WebServices 2005 (wscomps05). University of Technology of Compiegne,France, 60pp.

Clark, J. (Ed.) 1999. XSL Transformations (XSLT) (version 1.0). W3CRecommendation, November. Available at /http://www.w3.org/TR/xsltS.

Conrad, S., 2002. Schemaintegration—Integrationskonflikte, Losungsan-satze, aktuelle Herausforderungen (Schema integration—integrationconflicts, solution approaches, current challenges). Informatik—

Forschung & Entwicklung 17, 101–111.Czerwinski, S., Zhao, B.Y., Hodes, T., 1999. An architecture for a secure

service discovery service. In: Proceedings of the Fifth ACM/IEEEInternational Conference on Mobile Computing and Networking,Seattle, WA, USA, pp. 24–35.

Dongilli, P., Franconi, E., Tessaris, S., 2004. Semantics driven support forquery formulation. In: Proceedings of the International Workshop onDescription Logics, Whistler, BC, Canada /http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-104/14Dongilli-final.pdfS.

Donini, F.M., 2003. Complexity of reasoning. In: Baader, F., Calvanese, D.,McGuinness, D., Nardi, D., Patel-Schneider, P. (Eds.), The DescriptionLogic Handbook. Theory, Implementation and Applications. Cam-bridge University Press, Cambridge, pp. 96–136.

Egenhofer, M., 2002. Toward the semantic geospatial web. In: Proceed-ings of the 10th ACM International Symposium on Advances inGeographic Information Systems (ACM-GIS), McLean, VA, pp. 1–4.

Fonseca, F., Egenhofer, M., Agouris, P., Camara, G., 2002a. Usingontologies for integrated geographic information systems. Transac-tions in GIS 6, 231–257.

Fonseca, F., Egenhofer, M., Davis, C., Camara, G., 2002b. Semanticgranularity in ontology-driven geographic information systems.Annals of Mathematics and Artificial Intelligence 36, 121–151.

Goh, C.H., Bressan, S., Madnick, S., Siegel, M., 1999. Context interchange.New features and formalisms for the intelligent integration ofinformation. ACM Transaction on Information Systems 17, 270–293.

Goldstone, R.L., Son, J., 2005. Similarity. In: Holyoak, K., Morrison, R.(Eds.), Cambridge Handbook of Thinking and Reasoning. CambridgeUniversity Press, Cambridge, pp. 13–36.

Guarino, N., 1998. Formal ontology and information systems. In:Proceedings of the Formal Ontology in Information Systems(FOIS’98), Trento, Italy, pp. 3–15.

Guarino, N., Masolo, C., Vetere, G., 1999. Ontoseek: content-based accessto the web. IEEE Intelligent Systems 14, 70–80.

Guha, R., McCool, R., Miller, E., 2003. Semantic search. In: Proceedings ofthe 12th International Conference on World Wide Web, Budapest,Hungary, pp. 700–709.

Haarslev, V., Moller, R., 2004. Racer User’s Guide and Reference Manual,Version 1.7.19. Available at: /http://www.sts.tu-harburg.de/�r.f.moeller/racer/racer-manual-1-7-19.pdfS.

Heflin, J., Hendler, J., 2000. Searching the web with SHOE. In: Proceedingsof the AAAI Workshop, Menlo Park, CA, pp. 35–40.

Hinton, J., 1996. GIS and remote sensing integration for environmentalapplications. International Journal of Geographical InformationScience 10, 877–890.

Hubner, S., Spittel, R., Visser, U., Vogele, T., 2004. Ontology-based searchfor interactive digital maps. IEEE Intelligent Systems 19, 80–86.

Klien, E., Lutz, M., 2005. The role of spatial relations in automating thesemantic annotation of geodata. In: Proceedings of the InternationalConference on Spatial Information Theory (COSIT 2005), Ellicottville,NY, USA, pp. 133–148.

Lalonde, W. (Ed.), 2002. OpenGIS Styled Layer Descriptor ImplementationSpecification (version 1.0.0). OGC Document no. 02-070, September.Available at /http://www.opengeospatial.org/standards/sldS.

Levy, A.Y., 2000. Logic-based techniques in data integration. In: Minker, J.(Ed.), Logic Based Artificial Intelligence. Kluwer, Norwell, MA,pp. 575–595.

Levy, A.Y., Rajaraman, A., Ordille, J., 1996. Querying heterogeneousinformation sources using source descriptions. In: Proceedingsof the 22nd Very Large Databases Conference, Bombay, India,pp. 251–262.

Lin, K., Ludascher, B., 2003. A system for semantic integration of geologicmaps via ontologies. In: Proceedings of the Semantic Web Technol-ogies for Searching and Retrieving Scientific Data (SCISW), SanibelIsland, Florida, USA /http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-83/sia_2.pdfS.

Page 14: Overcoming semantic heterogeneity in spatial data infrastructures

ARTICLE IN PRESS

M. Lutz et al. / Computers & Geosciences 35 (2009) 739–752752

Lutz, M., 2005a. Ontology-based descriptions for semantic discovery andcomposition of geoprocessing services. Geoinformatica 11, 1–36.

Lutz, M., 2005b. Ontology-based service discovery in spatial datainfrastructures. In: Proceedings of the ACM Workshop on GeographicInformation Retrieval (GIR’05), Bremen, Germany, pp. 45–54.

Lutz, M., Klien, E., 2006. Ontology-based retrieval of geographicinformation. International Journal of Geographical InformationScience 20, 233–260.

Madche, A., Staab, S., Stojanovic, N., Studer, R., Sure, Y., 2001. SEAL—aframework for developing semantic portals. In: Proceedings of the18th British National Conference on Databases, Oxford, UK, pp. 1–22.

Mena, E., Illarramendi, A., Kashyap, V., Sheth, A., 2000. An approach forquery processing in global information systems based on inter-operation across pre-existing ontologies. International Journal onDistributed and Parallel Databases 8, 223–272.

Mena, F., Kashyap, V., Illarramendi, A., Sheth, A., 1998. Domain specificontologies for semantic information brokering on the globalinformation infrastructure. In: Proceedings of the First InternationalConference on Formal Ontologies in Information Systems, Trento,Italy /http://sid.cps.unizar.es/PUBLICATIONS/POSTSCRIPTS/fois98.ps.gzS.

Nebert, D., Whiteside, A., Vretanos, P. (Eds.), 2007. OpenGIS CatalogueServices Specification (version 2.0.2). OGC Document no. 07-006r1,February. Available at /http://www.opengeospatial.org/standards/catS.

Pennington, D., Michener, W.K., Berkley, C., Higgins, D., Jones, M.B.,Schildhauer, M., Bowers, S., Ludascher, B., Rajasekar, A., 2004. Thescience environment for ecological knowledge (SEEK): a distributed,ontology-driven environment for ecological modeling and analysis(abstract). In: Proceedings of the Third Conference of GeographicInformation Science (GIScience 2004), Adelphi, MD, USA, pp. 172–174.

Percivall, G. (Ed.), 2002. The OpenGIS Abstract Specification, Topic 12:OpenGIS Service Architecture (version 4.3). OGC Document no. 02-112,January. Available at /http://www.opengeospatial.org/standards/asS.

Portele, C. (ed.), 2007. OpenGIS Geography Markup Language EncodingStandard (version 3.2.1). OGC Document no. 07-036, August.Available at /http://www.opengeospatial.org/standards/gmlS.

Rahm, E., Bernstein, P.A., 2001. A survey of approaches to automaticschema matching. The International Journal on Very Large DataBases 10, 334–350.

Raubal, M., 2004. Formalizing conceptual spaces. In: Proceedings of theThird International Conference on Formal Information Systems,pp. 153–164.

Richardson, R., Smeaton, A.F., 1995. Using Wordnet in a knowledge-basedapproach to information retrieval. Technical Report ca-0395, DublinCity University, Dublin, Ireland.

Rodrıguez, A., Egenhofer, M., 2004. Comparing geospatial entity classes:an asymmetric and context-dependent similarity measure. Interna-tional Journal of Geographical Information Science 18, 229–256.

Sattler, U., Calvanese, D., Molitor, R., 2003. Relationships with otherformalisms. In: Baader, F., Calvanese, D., McGuinness, D., Nardi, D.,Patel-Schneider, P. (Eds.), The Description Logic Handbook. Theory,Implementation and Applications. Cambridge University Press,Cambridge, pp. 142–183.

Sheth, A.P., 1999. Changing focus on interoperability in informationsystems: from system, syntax, structure to semantics. In: Goodchild,M.F., Egenhofer, M., Fegeas, R., Kottman, C.A. (Eds.), InteroperatingGeographic Information Systems. Kluwer, Norwell, MA, pp. 5–30.

Sondheim, M., Gardels, K., Buehler, K., 1999. GIS interoperability. In:Longley, P., Goodchild, M., Rhind, D. (Eds.), Geographical InformationSystems: Principles, Techniques, Applications and Management.Wiley, New York, pp. 347–358.

Visser, U., Stuckenschmidt, H., 2002. Interoperability in GIS—enablingtechnologies. In: Proceedings of the Fifth Conference on GeographicInformation Science (AGILE 2002), Palma de Mallorca, Spain,pp. 291–297.

Visser, U., Stuckenschmidt, H., Schlieder, C., Wache, H., Timm, I., 2002a.Terminology integration for the management of distributed informa-tion resources. Kunstliche Intelligenz 16, 31–34.

Visser, U., Vogele, T., Schlieder, C., 2002b. Spatio-terminological informa-tion retrieval using the BUSTER system. In: Proceedings of the 16thConference on Informatics for Environmental Protection (EnviroInfo),Vienna, Austria, pp. 93–100.

Vogele, T., Spittel, R., 2004. Enhancing spatial data infrastructures withsemantic web technologies. In: Proceedings of the Seventh Con-ference on Geographic Information Science (AGILE 2004), Heraklion,Greece, pp. 105–111.

Vogele, T., Hubner, S., Schuster, G., 2003. BUSTER—an information brokerfor the semantic web. Kunstliche Intelligenz 03, 31–34.

Vretanos, P. (Ed.), 2005. OpenGIS Web Feature Service ImplementationSpecification (version 1.1.0). OGC Document no. 04-094, May.Available at /http://www.opengeospatial.org/standards/wfsS.

Wache, H., 2003. Semantische mediation fur heterogene Informations-quellen (Semantic mediation for heterogeneous informationsources). Ph.D. Dissertation, University of Bremen, Germany, Akade-mische Verlagsgesellschaft, Berlin.

Wache, H., Scholz, T., Stieghahn, H., Konig-Ries, B., 1999. An integrationmethod for the specification of rule-oriented mediators. In: Proceed-ings of the International Symposium on Database Applications inNon-Traditional Environments (DANTE 1999), pp. 109–112.

Wache, H., Vogele, T., Visser, U., Stuckenschmidt, H., Schuster, G.,Neumann, H., Hubner, S., 2001. Ontology-based integration ofinformation—a survey of existing approaches. In: Proceedings ofthe IJCAI-01 Workshop: Ontologies and Information Sharing, Seattle,WA, USA, pp. 108–117.

Wiederhold, G., 1992. Mediators in the architecture of future informationsystems. IEEE Computer 25, 38–49.

Wittgenstein, L., 1953. Philosophical Investigations. Blackwell, Oxford.