[ieee 2008 second ieee international conference on semantic computing (icsc) - santa monica, ca, usa...

8
Non-parametric Statistical Learning Methods for Inductive Classifiers in Semantic Knowledge Bases Claudia d’Amato, Nicola Fanizzi, Floriana Esposito Dipartimento di Informatica – Universit` a degli studi di Bari Via Orabona 4, 70125 Bari, Italy {claudia.damato, fanizzi, esposito}@di.uniba.it Abstract This work concerns non-parametric approaches for sta- tistical learning applied to the standard knowledge rep- resentations languages adopted in the Semantic Web con- text. We present methods based on epistemic inference that are able to elicit the semantic similarity of individuals in OWL knowledge bases. Specifically, a totally semantic and language independent semi-distance function is presented and from it, an epistemic kernel function for Semantic Web representations is derived. Both the measure and the ker- nel function are embedded into non-parametric statistical learning algorithms customized for coping with Semantic Web representations. Particularly, the measure is embedded into a k-Nearest Neighbor algorithm and the kernel function is embedded in a Support Vector Machine. The realized al- gorithms are used to performe inductive concept retrieval and query answering. An experimentation on real ontolo- gies proves that the methods can be effectively employed for performing the target tasks and moreover that it is possible to induce new assertions that are not logically derivable. 1 Learning from Ontologies The Semantic Web (SW) represents an emerging do- main where business, enterprise and organization on the Web will have its own organizational model (an ontology), and knowledge intensive automated manipulations on com- plex relational descriptions are foreseen. Specific formal languages for knowledge representation have been designed for supporting a variety of applications in this context span- ning from biological and geospatial fields to agents tech- nology and service oriented architectures. Description Log- ics (DLs) [1], a family of languages that is endowed with well-founded semantics and reasoning services, have been adopted as the core of the ontology language (OWL 1 ). 1 http://www.w3.org/2004/OWL/ Most of the research on formal ontologies has been fo- cused on methods based on deductive reasoning. Yet, such an approach may fail in case of noisy (and possibly incon- sistent) data coming from heterogeneous sources. Inductive learning method could be effectively employed to overcome this problem. Nevertheless, the research on inductive meth- ods and knowledge discovery applied to ontologic represen- tations have received less attention [5, 17, 10, 9]. In this paper we propose two non-parametric statistical learning methods suited on DL representation, namely the Nearest Neighbor (henceforth NN) approach [18] and the kernel methods [19], in order to perform important infer- ences on semantic knowledge bases (KBs), such as con- cept retrieval [1] and query answering. Indeed, concept retrieval and query answering can be cast as classification problems, i.e. assessing the class-membership of the indi- viduals in the KB w.r.t. some query concepts. Reasoning by analogy, similar individuals should likely belong to the extension of similar concepts. Moving from such an intu- ition, an instance-based framework (grounded on the NN approach and kernel methods) for retrieving resources con- tained in ontological KBs has been devised, to inductively infer (likely) consistent class-membership assertions that may not be logically derivable. As such, the resulting new (induced) assertions may enrich the KBs; indeed they can be suggested to the knowledge engineer that has only to vali- date them, thus making the ontology population task less time-consuming [2]. Both the NN approach and kernel methods are well known to be quite efficient and noise-tolerant and, differ- ently from the parametric statistical learning methods, they allow the hypothesis (the model to be learnt) complexity to grow with the data. Moreover, both of them are grounded on the exploitation of a notion of (dis-)similarity. Specif- ically, the NN approach retrieves individuals belonging to query concepts, by analogy with the most similar training instances, namely on the grounds of the classification of the nearest ones (w.r.t. the dissimilarity measure). Kernel meth- ods represent a family of statistical learning algorithms, in- The IEEE International Conference on Semantic Computing 978-0-7695-3279-0/08 $25.00 © 2008 IEEE DOI 10.1109/ICSC.2008.28 300 The IEEE International Conference on Semantic Computing 978-0-7695-3279-0/08 $25.00 © 2008 IEEE DOI 10.1109/ICSC.2008.28 291

Upload: floriana

Post on 27-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2008 Second IEEE International Conference on Semantic Computing (ICSC) - Santa Monica, CA, USA (2008.08.4-2008.08.7)] 2008 IEEE International Conference on Semantic Computing

Non-parametric Statistical Learning Methods for Inductive Classifiers inSemantic Knowledge Bases

Claudia d’Amato, Nicola Fanizzi, Floriana EspositoDipartimento di Informatica – Universita degli studi di Bari

Via Orabona 4, 70125 Bari, Italy{claudia.damato, fanizzi, esposito}@di.uniba.it

Abstract

This work concerns non-parametric approaches for sta-tistical learning applied to the standard knowledge rep-resentations languages adopted in the Semantic Web con-text. We present methods based on epistemic inference thatare able to elicit the semantic similarity of individuals inOWL knowledge bases. Specifically, a totally semantic andlanguage independent semi-distance function is presentedand from it, an epistemic kernel function for Semantic Webrepresentations is derived. Both the measure and the ker-nel function are embedded into non-parametric statisticallearning algorithms customized for coping with SemanticWeb representations. Particularly, the measure is embeddedinto a k-Nearest Neighbor algorithm and the kernel functionis embedded in a Support Vector Machine. The realized al-gorithms are used to performe inductive concept retrievaland query answering. An experimentation on real ontolo-gies proves that the methods can be effectively employed forperforming the target tasks and moreover that it is possibleto induce new assertions that are not logically derivable.

1 Learning from Ontologies

The Semantic Web (SW) represents an emerging do-main where business, enterprise and organization on theWeb will have its own organizational model (an ontology),and knowledge intensive automated manipulations on com-plex relational descriptions are foreseen. Specific formallanguages for knowledge representation have been designedfor supporting a variety of applications in this context span-ning from biological and geospatial fields to agents tech-nology and service oriented architectures. Description Log-ics (DLs) [1], a family of languages that is endowed withwell-founded semantics and reasoning services, have beenadopted as the core of the ontology language (OWL1).

1http://www.w3.org/2004/OWL/

Most of the research on formal ontologies has been fo-cused on methods based on deductive reasoning. Yet, suchan approach may fail in case of noisy (and possibly incon-sistent) data coming from heterogeneous sources. Inductivelearning method could be effectively employed to overcomethis problem. Nevertheless, the research on inductive meth-ods and knowledge discovery applied to ontologic represen-tations have received less attention [5, 17, 10, 9].

In this paper we propose two non-parametric statisticallearning methods suited on DL representation, namely theNearest Neighbor (henceforth NN) approach [18] and thekernel methods [19], in order to perform important infer-ences on semantic knowledge bases (KBs), such as con-cept retrieval [1] and query answering. Indeed, conceptretrieval and query answering can be cast as classificationproblems, i.e. assessing the class-membership of the indi-viduals in the KB w.r.t. some query concepts. Reasoningby analogy, similar individuals should likely belong to theextension of similar concepts. Moving from such an intu-ition, an instance-based framework (grounded on the NNapproach and kernel methods) for retrieving resources con-tained in ontological KBs has been devised, to inductivelyinfer (likely) consistent class-membership assertions thatmay not be logically derivable. As such, the resulting new(induced) assertions may enrich the KBs; indeed they can besuggested to the knowledge engineer that has only to vali-date them, thus making the ontology population task lesstime-consuming [2].

Both the NN approach and kernel methods are wellknown to be quite efficient and noise-tolerant and, differ-ently from the parametric statistical learning methods, theyallow the hypothesis (the model to be learnt) complexity togrow with the data. Moreover, both of them are groundedon the exploitation of a notion of (dis-)similarity. Specif-ically, the NN approach retrieves individuals belonging toquery concepts, by analogy with the most similar traininginstances, namely on the grounds of the classification of thenearest ones (w.r.t. the dissimilarity measure). Kernel meth-ods represent a family of statistical learning algorithms, in-

The IEEE International Conference on Semantic Computing

978-0-7695-3279-0/08 $25.00 © 2008 IEEE

DOI 10.1109/ICSC.2008.28

300

The IEEE International Conference on Semantic Computing

978-0-7695-3279-0/08 $25.00 © 2008 IEEE

DOI 10.1109/ICSC.2008.28

291

Page 2: [IEEE 2008 Second IEEE International Conference on Semantic Computing (ICSC) - Santa Monica, CA, USA (2008.08.4-2008.08.7)] 2008 IEEE International Conference on Semantic Computing

cluding the support vector machines (SVMs), that can bevery efficient since they map, by means of a kernel function,the original feature space into a higher-dimensional space,where the learning task is simplified and where the kernelfunction implements a dissimilarity notion.

From a technical viewpoint, extending the setting of theNN and the kernel methods to the DL representation re-quired to solve several issues: 1) a theoretical problem isposed by the Open World Assumption (OWA) that is gen-erally made on the semantics of SW ontologies, differentlyfrom the typical database standard where the Closed WorldAssumption (CWA) is made; 2) both NN and kernel meth-ods are devised for simple classifications where classes areassumed to be pairwise disjoint. This is quite unlikely inthe SW context where an individual can be instance ofmore than one concept; 3) suitable metrics, namely (dis-)similarity measures and kernel functions, are necessary forcoping with the high expressive power of DL representa-tions; such definitions could not be straightforward.

Most of the existing measures focus on concept(dis)similarity and particularly on the (dis)similarity ofatomic concepts within hierarchies or simple ontologies(see the discussion in [4]). Conversely, for our purposes, anotion of dissimilarity between individuals is required. Re-cently, dissimilarity measures for specific DL concept de-scriptions have been proposed [7, 8]. Although they turnedout to be quite effective for the inductive tasks of interest,they are partly based on structural criteria (a notion of nor-mal form) which determine their main weakness: they arehardly scalable to deal with standard ontology languages.To overcome these limitations, a semantic pseudo-metrics[13] is exploited. This language-independent measure as-sesses the dissimilarity of two individuals by comparingthem on the ground of their behaviors w.r.t. a committee offeatures (concepts), namely those defined in the KB or thatcan be generated to this purpose2. Since kernel functionsimplement a notion of dissimilarity, we have derived a ker-nel function from the semantic pseudo-metrics[13]. As forthe function from which it is derived, the kernel is languageindependent and it is based on the semantics of the individ-ual as epistemically elicited from the KB w.r.t. a numberof dimensions, represented by a committee3 of discriminantconcept descriptions (features).

We have embedded the pseudo-metrics in a NN algo-rithm and the kernel function in a kernel machine (specifi-cally a SVM). The obtained framework have been used forperforming inductive concept retrieval and query answer-ing with both approaches. We experimentally show thatthe methods perform concept retrieval and query answering

2The choice of optimal committees may be performed in advancethrough randomized search algorithms [13].

3As for the pseudo-metrics [13], also for the kernel function, the fea-ture set can be optimally generated by means of a simulated annealingprocedure[13]

comparably well w.r.t. a standard deductive reasoner. More-over, we show that the proposed framework is sometimesable to induce new concept assertions that are not logicallyderivable, namely the reasoner does not give any reply if anindividual is instance of a certain concept or not while ourframework asserts that such an individual is instance of thesame considered concept (or its negation).

In the next section the reference representation is brieflysummarized. In Sect. 3 the basics of the NN approach andits extension to the SW setting is analyzed. In Sect. 4 thepseudo-metrics used for performing the NN search is pre-sented. In Sect. 5 the basics of kernel methods and kernelfunctions are illustrated while in Sect. 6 the kernel derivedfrom the pseudo-metrics is illustrated. In Sect. 7 the exper-imental evaluation of the proposed methods is discussed.Conclusions are drawn in Sect. 8.

2 Representation and Inference

We assume that concept descriptions are defined in termsof a generic sub-language based on OWL-DL that may bemapped to DLs with the standard model-theoretic seman-tics (see [1] for a thorough reference). A knowledge baseK = 〈T ,A〉 contains a TBox T and an ABox A. T is a setof axioms that define concepts. A contains factual asser-tions concerning the resources, also known as individuals.The unique names assumption may be made on the ABoxindividuals, that are represented by their URIs. The set ofthe individuals occurring inA will be denoted with Ind(A).

As regards the inference services, like all other instance-based methods, our procedures may require performinginstance-checking [1], which roughly amounts to determinewhether an individual, say a, belongs to a concept exten-sion, i.e. whether C(a) holds for a certain concept C. Thisservice is provided proof-theoretically by a reasoner. Notethat because of the OWA, a reasoner may be unable to givea positive or negative answer to a class-membership query.

3 Query Answering as Nearest NeighborSearch

Query answering boils down to determining whether aresource belongs to a (query) concept extension. Here, analternative inductive method is proposed for retrieving theresources that likely belong to a query concept. Such amethod may also be able to provide an answer even when itmay not be inferred by deduction.

In similarity search [21] the basic idea is to find themost similar object(s) to a query one (i.e. the one that hasto be classified) w.r.t. a similarity (or dissimilarity) mea-sure. We review the basics of the k-NN method appliedto the SW context [8]. The objective is to induce an ap-proximation for a discrete-valued target hypothesis function

301292

Page 3: [IEEE 2008 Second IEEE International Conference on Semantic Computing (ICSC) - Santa Monica, CA, USA (2008.08.4-2008.08.7)] 2008 IEEE International Conference on Semantic Computing

h : IS 7→ V from a space of instances IS to a set of valuesV = {v1, . . . , vs} standing for the classes (concepts) thathave to be predicted. Note that normally |IS| � |Ind(A)|i.e. only a limited number of training instances is needed.

Let xq be the query instance whose class-membership isto be determined. Using a dissimilarity measure, the set ofthe k nearest (pre-classified) training instances w.r.t. xq isselected: NN(xq) = {xi | i = 1, . . . , k}. A k-NN al-gorithm approximates h for classifying xq on the groundsof the value that h assumes for the training instances inNN(xq), i.e. the k closest instances to xq . Precisely, thevalue is decided by means of a weighted majority votingprocedure, namely the value is given by the most voted classby the instances in NN(xq) weighted by the similarity ofthe neighbor individual. Formally, the estimation of the hy-pothesis function for the query individual is defined as:

h(xq) := argmaxv∈V

k∑i=1

wiδ(v, h(xi)) (1)

where δ returns 1 in case of matching arguments and 0 oth-erwise, and, given a dissimilarity measure d, the weights aredetermined by wi = 1/d(xi, xq).

Note that the estimate function h is defined extension-ally: the basic k-NN does not return an intensional classifi-cation model (namely a function or a concept definition), itmerely gives an answer for the instances to be classified. Itshould be also observed that this setting assigns a value tothe query instance which stands for one in a set of pairwisedisjoint concepts (corresponding to the value set V ). In amulti-relational setting such as the SW context this assump-tion cannot be made in general since an individual may beinstance of more than one concept.

Another issue is represented by the CWA usually madein the knowledge discovery context, opposite to the OWAcharacterizing the SW context. To deal with the OWA, theabsence of information on whether a training instance x be-longs to the extension of the query concept Q should notbe interpreted negatively, as in the standard settings whichadopt the CWA. Rather, it should count as neutral infor-mation. Thus, assuming the alternate viewpoint, the multi-class classification problem is transformed into a ternaryone. Hence another value set has to be adopted, namelyV = {+1,−1, 0}, where the three values denote: member-ship, non-membership, and unknown, respectively.

The task can be cast as follows: given a query conceptQ, determine the membership of an instance xq through theNN procedure (see Eq. 1) where V = {−1, 0,+1} and thehypothesis function values for the training instances are de-termined as follows:

hQ(x) =

+1 K |= Q(x)−1 K |= ¬Q(x)

0 otherwise

i.e. the value of hQ for the training instances is determinedby the entailment4 of the corresponding assertion from theknowledge base.

Note that, being this procedure based on a majority voteof the individuals in the neighborhood, it is less error-pronein case of noise in the data (e.g. incorrect assertions) w.r.t. apurely logic deductive procedure. Therefore it may be ableto give a correct classification even in case of inconsistentknowledge bases. At the same time, it should be noted thatthe inductive inference made by the procedure shown aboveis not guaranteed to be deductively valid. Indeed, inductiveinference naturally yields a certain degree of uncertainty.

4 A Semantic Pseudo-Metric for Individuals

For the NN procedure, we intend to exploit a dissimi-larity measure that totally depends on semantic aspects ofthe individuals in the knowledge base. The measure isbased on the idea of comparing the semantics of the in-put individuals along a number of dimensions representedby a committee of concept descriptions. Indeed, on a se-mantic level, similar individuals should behave similarlyw.r.t. the same concepts. Following the ideas borrowedfrom [20], totally semantic distance measures for individu-als can be defined in the context of a KB. More formally, therationale is to compare individuals on the grounds of theirsemantics w.r.t. a collection of concept descriptions, sayF = {F1, F2, . . . , Fm}, which stands as a group of discrim-inating features expressed in the OWL-DL sub-languagetaken into account. In its simple formulation, a family ofdistance functions for individuals inspired to Minkowski’snorms Lp can be defined as follows [13]:

Definition 4.1 (family of measures) Let K = 〈T ,A〉 bea KB. Given a set F = {F1, F2, . . . , Fm} of concept de-scriptions, a family of dissimilarity functions dF

p : Ind(A)×Ind(A) 7→ [0, 1] is defined as follows:

∀a, b ∈ Ind(A) dFp(a, b) :=

1|F|

|F|∑i=1

wi | δi(a, b) |p1/p

where p > 0 and ∀i ∈ {1, . . . ,m} the dissimilarity functionδi is defined by: ∀(a, b) ∈ (Ind(A))2

δi(a, b) =

0 Fi(a) ∈ A ∧ Fi(b) ∈ A1 Fi(a) ∈ A ∧ ¬Fi(b) ∈ A or

¬Fi(a) ∈ A ∧ Fi(b) ∈ A1/2 otherwise

4We use |= to denote entailment, as computed through a reasoner.

302293

Page 4: [IEEE 2008 Second IEEE International Conference on Semantic Computing (ICSC) - Santa Monica, CA, USA (2008.08.4-2008.08.7)] 2008 IEEE International Conference on Semantic Computing

or, model theoretically: ∀(a, b) ∈ (Ind(A))2

δi(a, b) =

0 K |= Fi(a) ∧ K |= Fi(b)1 K |= Fi(a) ∧ K |= ¬Fi(b) or

K |= ¬Fi(a) ∧ K |= Fi(b)1/2 otherwise

The model theoretic definition for the projections, re-quires the entailment of an assertion (instance-checking)rather than the simple ABox look-up; this can make themeasure more accurate yet more complex to compute un-less a KBMS is employed maintaining such information atleast for the concepts in F.

It can be proved [13] that these functions havethe standard properties for pseudo metrics (i.e. semi-distances [21]). This means that it cannot be proved thatdF

p(a, b) = 0 iff a = b (indiscernible case). anyway severalmethods have been proposed for avoiding this case [13].

The measures strongly depend on F. Here, the assump-tion that the F represents a sufficient number of (possiblyredundant) features that are able to discriminate really dif-ferent individuals is implicitly made. Anyway, optimal fea-ture can be learnt by the use of a randomized optimizationprocedure [13]. Nevertheless, it has been experimentallyshown that good results could be obtained by using the veryset of both primitive and defined concepts in the KB.

Of course these approximate measures become more andmore precise as the knowledge base is populated with anincreasing number of individuals.

5 Kernel Methods and Kernel Functions

Kernel methods [19] represent a family of statisticallearning algorithms (including the support vector machines(SVMs)) that have been effectively applied to a variety oftasks, recently also in domains that typically require struc-tured representations [14, 15]. They can be very efficientsince they map, by means of a kernel function, the origi-nal feature space into a high-dimensional space, where thelearning task is simplified. Such a mapping is not explicitlyperformed (kernel trick): the usage of a positive definitekernel function (i.e. a valid kernel) ensures that the embed-ding into a new space exists and that the kernel functioncorresponds to the inner product in this space [19].

Two components of the kernel methods have to be distin-guished: the kernel machine and the kernel function. Thekernel machine encapsulates the learning task, the kernelfunction encapsulates the hypothesis language. In this way,an efficient algorithm for attribute-value instance spaces canbe converted into one suitable for structured spaces (e.g.trees, graphs) by merely replacing the kernel function.

Kernels functions are endowed with the closure prop-erty w.r.t. many operations, one of them is the convolution

[16]: kernels can deal with compounds by decomposingthem into their parts, provided that valid kernels have al-ready been defined for them.

kconv(x, y) =∑

x ∈ R−1(x)

y ∈ R−1(y)

D∏i=1

ki(xi, yi) (2)

where R is a composition relationship building a singlecompound out of D simpler objects, each from a space thatis already endowed with a valid kernel. The choice of thefunction R is a non-trivial task which may depend on theparticular application.

On the ground of this property several kernel functionshave been defined: for string representations, trees, graphsand other discrete structures [14]. In [15], generic kernelsbased on type construction are formalized, where types aredeclaratively defined. In [6], kernels parametrized on a uni-form representation are introduced. Specifically, a syntax-driven kernel definition, based on a simple DL representa-tion (the Feature Description Language), is given.

Kernel functions for the SW representations have alsobeen defined [11, 3]. Scecifically, in [11] a kernel for com-paring ALC concept definitions is intrduced. It is based onthe structural similarity of the AND-OR trees correspond-ing to the normal form of the input concepts. This kernel isnot only structural, since it ultimately relies on the semanticsimilarity of the primitive concepts on the leaves, assessedby comparing their extensions through a set kernel. More-over, the kernel is applied to couples of individuals, afterhaving lifted them to the concept level through realizationoperators (actually by means of approximations of the mostspecific concept, see [1]). In [3], a set of kernels for indi-viduals and for the various types of assertions in the ABox(on concepts, datatype properties, object properties) are pre-sented. They should be composed for obtaining the finalkernel; anyway, it is not really specified how such separatebuilding blocks have to be integrated.

In this paper a new kernel function for SW representationis defined: the DL-kernel. Differently from those defined in[11, 3], the DL-kernel is totally semantic and language in-dependent. Jointly with a SVM, the DL-Link is used forexecuting a classification task in order to perform inductiveconcept retrieval and query answering. Note that, as for theNN approach (see Sect. 3), the SVM assumes, in its generalsetting, the CWA and the disjointness of the classes w.r.t.which classification is performed. In order to cope with theOWA and the multi-class classification problem (since anindividual can be instance of more than one concept) char-acterizing the SW context, the same setting modificationsof the NN approach have been performed, namely an indi-vidual is classified w.r.t. each class (concept) by the use ofa ternary set of value V = {+1, 0,−1} where 0 representsthe unknown information.

303294

Page 5: [IEEE 2008 Second IEEE International Conference on Semantic Computing (ICSC) - Santa Monica, CA, USA (2008.08.4-2008.08.7)] 2008 IEEE International Conference on Semantic Computing

6 A Family of Epistemic Kernels

In this section, we propose a family of kernels, derivedfrom the measure presented in Sect. 4, that can be directlyapplied to individuals. It is parameterized on a set of fea-tures (concepts) that are used for its computation. Simi-larly to KFOIL, a sort of dynamic propositionalization takesplace. However, in this setting the committee of conceptswhich are used as dimensions for the similarity functionare not due to be alternate versions of the same targetconcept but may vary freely, reflecting contextual knowl-edge. The form of the kernel function resembles that of theMinkowski’s metrics for vectors of numeric features:

Definition 6.1 (DL-kernel) Let K = 〈T ,A〉 be a knowl-edge base. Given a set of concept descriptions F ={F1, F2, . . . , Fm}, a family of kernel functions kF

p :Ind(A)× Ind(A) 7→ [0, 1] is defined as follows:

∀a, b ∈ Ind(A) kFp(a, b) :=

1|F|

|F|∑i=1

| σi(a, b) |p 1

p

where p > 0 and ∀i ∈ {1, . . . ,m} the simple similarityfunction σi is defined: ∀a, b ∈ Ind(A)

σi(a, b) =

1 (K |= Fi(a) ∧ K |= Fi(b))∨∨(K |= ¬Fi(a) ∧ K |= ¬Fi(b))

0 (K |= ¬Fi(a) ∧ K |= Fi(b))∨∨(K |= Fi(a) ∧ K |= ¬Fi(b))

12 otherwise

The rationale for this kernel is that similarity between in-dividuals is decomposed along with the similarity with re-spect to each concept in a given committee of features (con-cept definitions). Two individuals are maximally similarw.r.t. a given concept Fi if they exhibit the same behavior,i.e. both are instances of the concept or of its negation. Con-versely, the minimal similarity holds when they belong toopposite concepts. Because of the OWA, sometimes a rea-soner cannot assess the concept-membership, hence, sinceboth possibilities are open, we assign an intermediate valueto reflect such uncertainty.

It is also worthwhile to note that this is indeed a family ofkernels parameterized on the choice of the feature set. Theeffectiveness and also the efficiency of the measure compu-tation strongly depends on the choice of the feature com-mittee (feature selection). Optimal features can be learnt bythe use of randomized optimization procedures [13, 12].

The instance-checking is to be employed for assessingthe value of the σi functions. Yet this is known to be com-putationally expensive (also depending on the specific DLlanguage of choice). Alternatively, especially for largelypopulated ontologies which may be the objective of miningalgorithms, a simple look-up may be sufficient.

If it is required that k(a, b) = 1 ⇔ a = b even thoughthe selected features are not able to distinguish the two in-dividuals, one might make the unique names assumption onthe individuals occurring in the ABoxA, and employ a spe-cial additional feature based on equality: σ0(a, b) = 1 iffa = b (and 0 otherwise). Alternatively, equivalence classesmight be considered instead of mere individuals.

The most important property of a kernel function is itsvalidity.

Proposition 6.1 (validity) Given an integer p > 0 and acommittee of features F, the function kF

p is a valid kernel.

This result can be assessed by proving the property byshowing that the function can be obtained by composingsimpler valid kernels through operations that guarantee theclosure w.r.t. this property [16]. Specifically, since the sim-ilarity functions σi (i = 1, . . . , n) correspond to matchingkernels, the property follows from the closure w.r.t. sum,multiplication by a constant and kernel multiplication.

The intermediate value used in the uncertain cases maybe chosen more carefully so to reflect the inherent uncer-tainty related to the specific features. An alternative choicethat is being experimented is related to the balance betweenthe number of known individuals that belong to the featureconcept and those that certainly belong to its negation.

7 Experimental Evaluation

The measure presented in Sect. 3 has been integratedin the NN procedure (see Sect. 3) while the DL-Kernel(Sect. 6) has been embedded in a SVM from the LIBSVMlibrary5. Both methods have been tested by applying themto a number of retrieval and query answering problems.In the following, the results of the experiments for eachmethod are reported.

7.1 Experiments with the Feature Com-mittee Measure

In order to assess the validity of the k-NN algorithmpresented in Sect. 3 with the measure defined in Sect. 4,a number of OWL ontologies from different domainshave been considered: SURFACE-WATER-MODEL (SWM),NEWTESTAMENTNAMES (NTN) from the Protege li-brary6, Semantic Web Service Discovery dataset7 (SWSD);an ontology generated by the Lehigh University Bench-

5Software downloadable at: http://www.csie.ntu.edu.tw/

~cjlin/libsvm6http://protege.stanford.edu/plugins/owl/owl-library7https://www.uni-koblenz.de/FB4/Institutes/IFI/

AGStaab/Projects/xmedia/dl-tree.htm

304295

Page 6: [IEEE 2008 Second IEEE International Conference on Semantic Computing (ICSC) - Santa Monica, CA, USA (2008.08.4-2008.08.7)] 2008 IEEE International Conference on Semantic Computing

Table 1. Ontologies employed for the experi-ments.

Ontology DL language #conc. #obj. prop. #individualsSWM ALCOF(D) 19 9 115

BIOPAX ALCHF(D) 28 19 323LUBM ALR+HI(D) 43 7 555

NTN SHIF(D) 47 27 676SWSD ALCH 258 25 732

FINANCIAL ALCIF 60 17 1000

mark8 (LUBM); BioPax glycolysis ontology9 (BioPax) andFINANCIAL ontology10. Tab. 1 summarizes details concern-ing these ontologies.

For each ontology, 30 queries were randomly generatedby composition of (2 through 8) primitive or defined con-cepts in each knowledge base by means of the operators ofthe related OWL sub-language. We employed the simplestversion of the distance (dF

1) with the committee of feature Fmade by all concepts in the knowledge base.

The parameter k was set to√|Ind(A)|, as advised in the

instance-based learning literature. Yet we found experimen-tally that much smaller values could be chosen, resulting inthe same classification.

The performance was evaluated comparing the classifierresponses to those returned by a standard reasoner11 as abaseline, and the following indices have been consideredfor the evaluation:

• match rate: number of cases of individuals that gotexactly the same classification by both classifiers withrespect to the overall number of individuals;

• omission error rate: amount of individuals for whichinductive method could not determine whether theywere relevant to the query or not (namely individualsclassified as belonging to the class 0) while they wereactually relevant (classified as +1 or −1 by the stan-dard reasoner);

• commission error rate: amount of individuals (analog-ically) found to be relevant to the query concept, whilethey (logically) belong to its negation or vice-versa

• induction rate: amount of individuals found to be rel-evant to the query concept or to its negation, while ei-ther case is not logically derivable from the knowledgebase

8http://swat.cse.lehigh.edu/projects/lubm/9http://www.biopax.org/Downloads/Level1v1.4/

biopax-example-ecocyc-glycolysis.owl10http://www.cs.put.poznan.pl/alawrynowicz/financial.

owl11We employed PELLET v. 1.5.1. See http://pellet.owldl.com

Table 2. k-NN outcomes: averages ± stan-dard deviations and [min,max] intervals.

match commission omission induction

SWM93.3 ± 10.3 0.0 ± 0.0 2.5 ± 4.4 4.2 ± 10.5[68.7;100.0] [0.0;0.0] [0.0;16.5] [0.0;31.3]

BIOPAX99.9 ± 0.2 0.2 ± 0.2 0.0 ± 0.0 0.0 ± 0.0

[99.4;100.0] [0.0;0.06] [0.0;0.0] [0.0;0.0]

LUBM99.2 ± 0.8 0.0 ± 0.0 0.8 ± 0.8 0.0 ± 0.0

[98.0;100.0] [0.0;0.0] [0.0;0.2] [0.0;0.0]

NTN98.6 ± 1.5 0.0 ± 0.1 0.8 ± 1.1 0.6 ± 1.4

[93.9;100.0] [0.0;0.4] [0.0;3.7] [0.0;6.1]

SWSD97.5 ± 3.7 0.0 ± 0.0 1.8 ± 2.6 0.8 ± 1.5

[84.6;100.0] [0.0;0.0] [0.0;9.7] [0.0;5.7]

FINANCIAL99.5 ± 0.8 0.3 ± 0.7 0.0 ± 0.0 0.2 ± 0.2

[97.3;100.0] [0.0;2.4] [0.0;0.0] [0.0;0.6]

Tab. 2 reports the outcomes in terms of these indices.Preliminarily, it is important to note that, in each experi-ment, the commission error was low or absent. This meansthat the search procedure is quite accurate: it did not makecritical mistakes i.e. cases when an individual is deemedas an instance of a concept while it really is an instanceof a disjoint one. Also omission error is quite low, yet itmore typical over all of the ontologies that were consid-ered. A noteworthy difference was observed for the caseof the SWS knowledge base for which we find the lowestmatch rate and the highest variability in the results over thevarious concepts.

The usage of all concepts in each ontology for the set Fof dF

1 made the measure quite accurate, which is the reasonwhy the procedure resulted quite conservative as regards in-ducing new assertions. In many cases, it matched ratherfaithfully the reasoner decisions. The cases of induction areinteresting because they suggest new assertions which can-not be logically derived by using a deductive reasoner, yetthey might be used to complete a knowledge base [2], e.g.after being validated by an ontology engineer.

7.2 Experimental Evaluation of the Epis-temic Kernel

In order to experimentally assess the validity of the epis-temic kernel (see Def. 6.1), the instance classification taskhas been performed on the ontologies detailed in Tab. 1.

The classification method was applied for all the individ-uals in each ontology. Specifically, for each ontology, theindividuals were checked to assess if they were instances ofthe concepts in the ontology through the SVM method andthe DL-Kernel12 (p = 2) embedded in it. A similar experi-

12The feature set F for computing the epistemic kernel was made by allconcepts in the considered ontology (see Def. 6.1).

305296

Page 7: [IEEE 2008 Second IEEE International Conference on Semantic Computing (ICSC) - Santa Monica, CA, USA (2008.08.4-2008.08.7)] 2008 IEEE International Conference on Semantic Computing

Table 3. Results (average and std. deviation)of the experiments on concept classificationwith the DL-kernel.

ONTOLOGY match induction om. error com. error

SWM 86.18± 17.55 7.98± 16.08 5.83± 4.64 0.00± 0.00NTN 90.95± 13.14 3.99± 11.55 4.76± 7.48 0.30± 1.43

BIOPAX 92.05± 11.34 0.55± 2.70 0.00± 0.00 7.41± 11.38LUBM 92.95± 12.11 3.53± 12.24 3.52± 4.90 0.00± 0.00

FINANCIAL 97.24± 6.75 0.32± 0.15 0.02± 0.08 2.42± 6.75SWSD 98.68± 4.06 0.00± 0.00 1.32± 4.06 0.00± 0.00

mental setting has been considered in [3] with an exempli-fied version of the GALEN Upper Ontology13. There, theontologies have been randomly populated and only sevenconcepts have been considered while no roles have beentaken into account14. Differently from this case, we did notapply any changes on the considered ontologies.

The performance of the classifier induced by the SVMwas evaluated by comparing its responses to those returnedby a standard reasoner15 used as baseline. The experimenthas been performed by adopting the ten-fold cross valida-tion procedure. The average measures obtained over all theconcepts in each ontology are reported in Tab. 3, jointlywith their standard deviation.

By looking at the table, it is important to note that, forevery ontology, the commission error is almost null. Thismeans that the classifier did not make critical mistakes, i.e.cases when an individual is deemed as an instance of a con-cept while it really is known to be an instance of anotherdisjoint concept. In the same time it is important to notethat a very high match rate is registered for every ontology.Particularly, by considering both Tab. 3 and Tab. 1, it is in-teresting to observe that the match rate increases with theincrease of the complexity of the considered ontology. Thisis because the performance of a statistical method improveswith the augmentation of the set of the available examples,that means that there is more information for better separat-ing the example space.

Almost always the SVM-based classifier is able to in-duce new knowledge. However, a more conservative behav-ior w.r.t. the previous experiment has been also registered,indeed the omission error rate is not null (even if it is veryclose to 0). To decrease the tendency to a conservative be-havior of the method, a threshold could be introduced forthe consideration of the ”unknown” (namely labeled with0) training examples.

Another experiment regarded testing the SVM-based

13http://www.cs.man.ac.uk/~rector/ontologies/

simple-top-bio14Due to the lack of information for replicating the ontology used in [3],

a comparative experiment with the proposed kernel framework cannot beperformed.

15We employed PELLET v. 1.5.1. See http://pellet.owldl.com

Table 4. Results (average and standard devia-tion) of the experiments for performing queryanswering with the DL-kernel.

ONTOLOGY match induction om. error com. error

SWM 82.31± 21.47 9.11± 16.49 8.57± 8.47 0.00± 0.00NTN 80.38± 17.04 8.22± 16.87 9.98± 10.08 1.42± 2.91

BIOPAX 84.04± 14.55 0.00± 0.00 0.00± 0.00 15.96± 14.55LUBM 76.75± 19.69 5.75± 5.91 0.00± 0.00 17.50± 20.87

FINANCIAL 97.85± 3.41 0.42± 0.23 0.02± 0.07 1.73± 3.43SWSD 97.92± 3.79 0.00± 0.00 2.09± 3.79 0.00± 0.00

method when performing inductive concept retrieval w.r.t.new query concepts built from the considered ontology. Themethod has been applied to perform a number of retrievalproblems applied to the considered ontologies again usingthe chosen SVM and the DL-kernel function. The experi-ment was quite intensive, involving the classification of allthe individuals in each ontology. Specifically, the individu-als were checked through the inductive procedure to assesswhether they were retrieved as instances of a query con-cept. A number of 20 queries were randomly generated byapplying the available constructors to primitive and/or de-fined concepts and roles from each ontology. The generatedconcepts had also to be satisfiable (yet they may yield noinstance from the logic based retrieval). Like for the previ-ous experiment, a ten-fold cross validation was performedfor each dataset. The outcomes are reported in Tab. 4, fromwhich it is possible to observe that the behavior of the clas-sifier on these concepts is not very dissimilar with respect tothe outcomes of the previous experiment reported in Tab. 3.These queries were expected to be harder than the previousones which correspond to the very primitive or defined con-cepts for the various ontologies. Specifically, the commis-sion error rate was low for all but two ontologies (BIOPAXand LUBM) for which some very difficult queries weregenerated which raised this rate beyond 10% and conse-quently also the standard deviation values.

By comparing these results with those obtained by theuse of the NN approach (see Tab. 2) it is possible to assertthat both methods showed high accuracy in performing theclassification task, even if the NN method has resulted to bemore accurate when a lower number of instances are avail-able for performing the learning task.

8 Conclusions and Outlook

With the aim of going beyond the limitations of classiclogic-based methods, we investigated on the application ofnon-prametric statistical multi-relational learning methodsin the context of the SW representations. Specifically, afamily of semi-distance functions and kernel functions havebeen defined for OWL descriptions. They have been inte-

306297

Page 8: [IEEE 2008 Second IEEE International Conference on Semantic Computing (ICSC) - Santa Monica, CA, USA (2008.08.4-2008.08.7)] 2008 IEEE International Conference on Semantic Computing

grated respectively in a NN algorithm and a SVM for in-ducing a statistical classifier working with the complex rep-resentations. The resulting classifiers could be tested on in-ductive retrieval and classification problems.

The peculiarity of learning with ABoxes that are natu-rally assumed to be interpreted with an open-world seman-tics required a peculiar assessment of the performance of theinduced classifiers, since conclusions have to deal with thechance of uncertainty: some instances cannot be attributedto a class or to its negation.

Experimentally, it was shown that performance of bothclassifiers are not only comparable to a standard deductivereasoner, but they are also able to induce new knowledge,which is not logically derivable. Particularly, an increasein prediction accuracy was observed for ontologies that arehomogeneously populated (similar to a database).

The induced classification results can be exploited forpredicting or suggesting missing information about individ-uals, thus completing large ontologies. Specifically, it canbe used to semi-automatize the population of an ABox. In-deed, the new assertions can be suggested to the knowledgeengineer that has only to validate their inclusion. This con-stitutes a new approach in the SW context, since the effi-ciency of the statistical and numerical approaches and theeffectiveness of a symbolic representation have been com-bined.

References

[1] F. Baader, D. Calvanese, D. McGuinness, D. Nardi, andP. Patel-Schneider, editors. The Description Logic Hand-book. Cambridge University Press, 2003.

[2] F. Baader, B. Ganter, B. Sertkaya, and U. Sattler. Complet-ing description logic knowledge bases using formal conceptanalysis. In M. Veloso, editor, IJCAI 2007, Proc. of the20th Int. Joint Conf. on Artificial Intelligence, pages 230–235, 2007.

[3] S. Bloehdorn and Y. Sure. Kernel methods for mining in-stance data in ontologies. In K. A. et al., editor, In Proc.of the 6th Int. Semantic Web Conference, volume 4825 ofLNCS, pages 58–71. Springer, 2007.

[4] A. Borgida, T. Walsh, and H. Hirsh. Towards measuring sim-ilarity in description logics. In I. Horrocks, U. Sattler, andF. Wolter, editors, Working Notes of the International De-scription Logics Workshop, volume 147 of CEUR WorkshopProceedings, Edinburgh, UK, 2005.

[5] W. Cohen and H. Hirsh. Learning the CLASSIC descrip-tion logic. In P. Torasso, J. Doyle, and E. Sandewall, ed-itors, Proceedings of the 4th International Conference onthe Principles of Knowledge Representation and Reasoning,pages 121–133. Morgan Kaufmann, 1994.

[6] C. Cumby and D. Roth. On kernel methods for relationallearning. In T. Fawcett and N.Mishra, editors, Proceedingsof the 20th International Conference on Machine Learning,ICML2003, pages 107–114. AAAI Press, 2003.

[7] C. d’Amato, N. Fanizzi, and F. Esposito. A dissimilaritymeasure for ALC concept descriptions. In Proceedings ofthe 21st Annual ACM Symposium of Applied Computing,SAC2006, volume 2, pages 1695–1699, Dijon, France, 2006.ACM.

[8] C. d’Amato, N. Fanizzi, and F. Esposito. Reasoning by anal-ogy in description logics through instance-based learning. InG. Tummarello, P. Bouquet, and O. Signore, editors, Pro-ceedings of Semantic Web Applications and Perspectives,3rd Italian Semantic Web Workshop, SWAP2006, volume201 of CEUR Workshop Proceedings, Pisa, Italy, 2006.

[9] M. d’Aquin, J. Lieber, and A. Napoli. Decentralized case-based reasoning for the Semantic Web. In Y. Gil, E. Motta,V. Benjamins, and M. A. Musen, editors, Proceedings ofthe 4th International Semantic Web Conference, ISWC2005,number 3279 in LNCS, pages 142–155. Springer, 2005.

[10] F. Esposito, N. Fanizzi, L. Iannone, I. Palmisano, and G. Se-meraro. Knowledge-intensive induction of terminologiesfrom metadata. In F. van Harmelen, S. McIlraith, andD. Plexousakis, editors, ISWC2004, Proceedings of the 3rdInternational Semantic Web Conference, volume 3298 ofLNCS, pages 441–455. Springer, 2004.

[11] N. Fanizzi and C. d’Amato. A declarative kernel for ALCconcept descriptions. In F. Esposito, Z. W. Ras, D. Malerba,and G. Semeraro, editors, In Proceedings of the 16th Inter-national Symposium on Methodologies for Intelligent Sys-tems, ISMIS2006, volume 4203 of Lecture Notes in Com-puter Science, pages 322–331. Springer, 2006.

[12] N. Fanizzi, C. d’Amato, and F. Esposito. Evolutionary con-ceptual clustering of semantically annotated resources. InProceedings of the IEEE International Conference on Se-mantic Computing, ICSC2007, pages 783–790. IEEE, 2007.

[13] N. Fanizzi, C. d’Amato, and F. Esposito. Induction of op-timal semi-distances for individuals based on feature sets.In D. Calvanese, E. Franconi, V. Haarslev, D. Lembo,B. Motik, A.-Y. Turhan, and S. Tessaris, editors, WorkingNotes of the 20th International Description Logics Work-shop, DL2007, volume 250 of CEUR Workshop Proceed-ings, Bressanone, Italy, 2007.

[14] T. Gartner. A survey of kernels for structured data. SIGKDDExplorations, 5(1):49–58, 2003.

[15] T. Gartner, J. Lloyd, and P. Flach. Kernels and distances forstructured data. Machine Learning, 57(3):205–232, 2004.

[16] D. Haussler. Convolution kernels on discrete structures.Technical Report UCSC-CRL-99-10, Department of Com-puter Science, University of California – Santa Cruz, 1999.

[17] J.-U. Kietz and K. Morik. A polynomial approach to the con-structive induction of structural knowledge. Machine Learn-ing, 14(2):193–218, 1994.

[18] T. Mitchell. Machine Learning. McGraw-Hill, 1997.[19] B. Scholkopf and A. Smola. Learning with Kernels. The

MIT Press, 2002.[20] M. Sebag. Distance induction in first order logic. In

S. Dzeroski and N. Lavrac, editors, Proceedings of the 7thInternational Workshop on Inductive Logic Programming,ILP97, volume 1297 of LNAI, pages 264–272. Springer,1997.

[21] P. Zezula, G. Amato, V. Dohnal, and M. Batko. SimilaritySearch – The Metric Space Approach. Advances in databaseSystems. Springer, 2007.

307298