approximating numeric role fillers via predictive clustering trees for knowledge base enrichmenent...

Approximating Numeric Role Fillers viaPredictive Clustering Trees for KnowledgeBase Enrichment in the Web of dataGiuseppe Rizzo, Claudia d’Amato, Nicola Fanizzi, Floriana Esposito

Discovery Science 2016, Bari, 19th October 2016

G.Rizzo et al. (Univ. of Bari) 19th October 2016 1 / 16

Outline

1 The Context and Motivations

2 Basics

3 The approach

4 Empirical Evaluation

5 Conclusion & Further Extensions


The Context and Motivations

• Goal: Determine the numerical property values (used asattributes) for a resource in a Web of Data knowledge base• Web of data: lots of knowledge bases exposed in a standard

format (RDF, OWL)• Two resources or a resource and a literal are linked through

properties (strings, numerical values)

• Inference services may fail to determine the value due to theOpen World Assumption

• Solution: solve a multi-target regression problem

• Predictive Clustering Trees (PCTs) for the Web of Datarepresentations (e.g. DLs)• to predict the most plausible value• to elicit rules (e.g. SWRL rules) for enriching the schema of a

knowledge base


Description LogicsSyntax & Semantics

• Atomic concepts (classes), NC

• Roles (binary relations), NR

• Concrete domains: string, boolean, numeric values

• Operators to build complex concept descriptions

• Semantics defined through interpretations I = (∆I , ·I)• ∆I : domain of the interpretation• ·I : intepretation function

• for each concept C ∈ NC , CI ⊆ ∆I

• for each role R ∈ NR , RI ⊆ ∆I ×∆I


Description LogicsKnowledge bases

• Knowledge base: a couple K = (T ,A) where• T (TBox): axioms concerning concepts/roles

• Subsumption axioms C v D: iff for every interpretation I,CI ⊆ DI holds

• Equivalence axioms C ≡ D: iff for every interpretation I,CI ⊆ DI and I, DI ⊆ CI holds

• A (ABox): assertions about a set of individuals is denoted byInd(A)• class assertions, C(a)• role assertions,R(a, b) ( b is called role filler)

• Reasoning services:• subsumption: a concept is more general than a given one• satisfiability: given a concept description C and an interpretationI, CI 6= ∅

• instance checking: for every interpretation, I C (a) holds (a is aninstance for C )


Semantic Web Rules Language (SWRL)

• Datalog-like representation language

• Adds the expressiveness to DLs

• Syntax:• term: a (universal quantified) variable x or a constant c• atom: unary or binary predicate C (t1) and R(t1, t2) (predicate

symbols are concept and role names), where ti are terms• Rule: implication between an antecedent/body and a consequent

B1 ∧ · · · ∧ Bn → H1 ∧ · · ·Hm

We are interested to safety rules (each variable in the body mustbe in the head)

• Open-World Assumption holds


The model for multi-target regression

• PCT for multi-targetregression: a binarytree where• inner nodes: DL

conjunctive conceptdescriptions

• leaf nodes: vectorswith theapproximated targetproperties values

Comedy

Comedy u starring.Actor

~p = (8.45, 9810666) ~p = (5.38, 4200000)

¬Comedy u ¬Horror

~p = (4.7, 4200000) ~p = (8.6, 4930000)


Learning PCTs in DLs

• Divide-and-conquer approach• Training set: individuals whose target properties values are known

• partitioning according to the membership w.r.t. a new concept

• Refinement operator for generating the concepts• by introducing a new concept name (or its complement)• by replacing a sub-description with an existential restriction• by replacing a sub-description with an universal restriction

• Best Concept: minimization the RMSE of the standardizedvalues of the target properties

• Stop conditions: maximum number of levels or size of thetraining (sub)set


Exploiting PCTs

• Prediction: given an individual a, the algorithm traverses the treeaccording to the instance check w.r.t. the inner concepts D• if K |= D(a) the left branch is followed• K |= ¬D(a) the right branch is satisfied• otherwise, a default model is returned

• Eliciting SWRL rules: traversing recursive tree structure andcollecting the intermediate concept along a branch• Body: intermediate concept descriptions as predicate name• Head: each target property name as the predicate name

• the approximated value as a term


ExperimentsSmall ontologies: Settings

• Maximum depth for PCTs: 10, 15,20

• Comparison w.r.t. Terminological regression trees (TRT),

multi-target k-nn regressor (with k =√

Tr) and multi-targetlinear regression model• atomic concepts as features set for k-nn regressor and multi-target

linear regression model

• 0.632 bootstrap

• performance in terms of RMSE


ExperimentsLinked Data datasets: Settings

• Ontologies extracted from DBPedia via crawling

• Maximum depth for PCTs: 10, 15,20

• Comparison w.r.t. TRTs,k-nn (with k =√

Tr) and LR

• 10-fold cross validation

• performance in terms of RRMSE


ExperimentsSmall ontologies: Outcomes

RMSE averaged over the number of the replications (and standarddeviations)

Ontology PCT TRT k-NN LR

BCO 0.0277± 0.01 0.0356± 0.01 0.0472± 0.01 0.0554± 0.01

BioPax 132± 11.0 145± 12.0 186± 7.00 195± 8.85

geopolitical 0.0284± 0.01 0.03561± 0.03 0.057± 0.03 0.06± 0.02

monetary 7.52± 0.15 8.46± 0.07 7.53± 0.17 7.78± 0.34

mutagenesis 0.0445± 0.07 0.0637± 0.03 0.0547± 0.02 0.0647± 0.05


ExperimentsLinked Data datasets: Outcomes

Table: RRMSE averaged on the number of runs

Datasets PCT TRT k-NN LR

Fragm. #1 0.42± 0.05 0.63± 0.05 0.65± 0.02 0.73± 0.02

Fragm. #2 0.25± 0.001 0.43± 0.02 0.53± 0.00 0.43± 0.02

Fragm. #3 0.24± 0.05 0.36± 0.2 0.67± 0.10 0.73± 0.05

Table: Comparison in terms of elapsed times (secs)

Datasets PCT TRT k-NN LR

Fragm #1 elevation 2454.3populationTotal 2353.0

total 2432 4807.3 547.6 234.5

Fragm #2 areaTotal 2256.0areaUrban 2345.0areaMetro 2345.2

total 2456 6946.2 546.2 235.7

Fragm #3 height 743.5weight 743.4total 743.3 1486.9 372.3 123.5


Discussion

• PCTs more performant than TRTs• the different heuristic allows to choose more promising concepts• standardization mitigated the abnormal values

• PCTs more performant than k-nn• curse of dimensionality

• k-nn more performant than LR• spurious individuals were excluded to determine the local model

• PCTs more efficient than TRTs


Examples of discovered rules

According to the discovered rules an American football player is tallerthan a person that does not play american football

Person(x) ∧ Athlete(x) ∧ AmericanFootballPlayer(x)→height(x , 195.4)

Person(x) ∧ Athlete(x) ∧ AmericanFootballPlayer(x)→weight(x , 113.5)

Person(x) ∧ Athlete(x) ∧ ¬AmericanFootballPlayer(x)→height(x , 187)

Person(x) ∧ Athlete(x) ∧ ¬AmericanFootballPlayer(x)→weight(x , 87.5)


Conclusion and Further Outlooks

• We proposed an extension of predictive clustering trees compliantto DL representation languages for solving the problem ofpredicting datatype properties and discovering rules

• The outcomes are promising

• Further extensions• New refinement operators• Further heuristics• linear models at leaf nodes


Questions?


Table: Datasets extracted from DBPedia

Datasets Expr. Axioms. #classes # properties # ind.

Fragm.#1 ALCO 17222 990 255 12053

Fragm.#2 ALCO 20456 425 255 14400

Fragm.#3 ALCO 9070 370 106 4499

Table: Target properties ranges, number of individuals employed in thelearning problem

Datasets Properties Range |Tr|

Fragm. # 1elevation [-654.14,19.00]

10000populationTotal [0.0, 2255]

Fragm. #2areaTotal [0, 16980.1]

10000areaUrban [0.0, 6740.74]areaMetro [0, 652874]

Fragm. #3height [0,251.6]

2256weight [-63.12,304.25]