towards distributed information retrieval in the semantic web: query reformulation using the...

26
Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework [email protected] Wednesday 14 th of June, 2006 Raphaël Troncy , Umberto Straccia

Upload: flora-rose

Post on 04-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Towards Distributed Information Retrieval in

the Semantic Web:Query Reformulation Usingthe Framework

[email protected]

Wednesday 14th of June, 2006

Raphaël Troncy, Umberto Straccia

Page 2: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Motivation

Various SW repositories, using different vocabularies, distributed on the web

Already large amounts of data out thereSwoogle hits 1.5M unique Semantic Web

documents (05/06/2006)Problem:How to search and retrieve information in

such an environment?

Page 3: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Example scenario

Kim Clijsters (courtesy of AFP)

Montenegro independence (courtesy of Euronews)

NewsML

SportsML

EventsML

iCalendar

TimeML

<newsItem schema="0.7" version="2"> <itemMeta>  <contentClass code="ccls:photo" />   </itemMeta> <contentMeta>  <infoSource literal="AFP" /> <locCreated code="city:Paris"/>  </contentMeta> ...</newsItem>

EXIF

EBU P/Meta

MXF

MPEG-7

Page 4: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Distributed Search in the SW:Resource selectionSelect a subset of some relevant

resourcesQuery reformulationReformulate the information need into

the vocabulary used by the resource Data fusion and rank aggregationMerged and ranked all the results

together

Page 5: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Resource Selection

Compute an approximation of the content of each resources

For some random queries, an approximation consists of:The ontology the resource relies onSome instances (sampling annotated

documents)

Page 6: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Query reformulation

Transformation rules:From the query vocabulary to the vocabularies

used by the resources Semantic Web: Ontology Alignment

Establishing relationships holding between the entities (subsumption, equivalence, disjointness…)

With a confidence measure Automatically computed

1..0

an ontologyalignment framework

Page 7: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

oMAP: Ontology Alignment Tool

TerminologicalClassifiers

Machine Learning-based Classifiers

Structural andSemantics-based

Classifiers

- Formal and open framework- Classifiers customization (parameter, chaining)

subsumption

equivalent

Page 8: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

oMAP: A Formal Framework

Sources of inspiration:Formal work in data exchange [Fagin et al.,

2003]GLUE: combining several specialized

components for finding the best set of mappings [Doan et al., 2003]

Notation:A mapping is a triple: M = (T, S, ∑)S and T are the source and target ontologiesSi is an OWL entity (class, datatype property,

object property) of the ontology∑ is a set of mapping rules: αij Tj ← Si

Page 9: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

oMAP: Combining Classifiers

Weight of a mapping rule:αij = w (Si,Tj, ∑)

Using different classifiers:w (Si,Tj,CLk) is the classifier's

approximation of the rule Tj ← Si

Combining the approximations:Use of a priority list: CL1 CL2 … CLn Weighted average of the classifiers

prediction

Page 10: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Terminological Classifiers

Same entity names (or URI)

Same entity name stems

otherwise 0

name, same have , if 1),,(

ji

Nji

TSCLTSw

otherwise 0

stem, same have , if 1),,(

ji

Sji

TSCLTSw

Page 11: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Terminological Classifiers

String distance name

Iterative substring matching

See [Stoilos et al., ISWC'05]

))(),(max(

),(),,(

ji

jinLevenshteiLDji TlengthSlength

TSdistCLTSw

),(),(),(),,( jijijiISji TSwinklerTSDiffTSCommCLTSw

Page 12: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Terminological Classifiers

WordNet distance name

lcs is the longest common substring between Si

and Tj

sim =

otherwise

)()(

*2,max

synonyms, are , if 1

),,(

ji

ji

WNji

TlengthSlength

lcssim

TS

CLTSw

)()(

)()(

ji

ji

TsynonymSsynonym

TsynonymSsynonym

Page 13: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Machine Learning-Based Classifiers

Collecting bag of words:label for the named individualsdata value for the datatype propertiestype for the anonymous individuals and the

range of object properties…

Recursion on the OWL definition:depth parameter

Use statistical methods on the collected bag of words

Page 14: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Machine Learning-Based Classifiers

ExampleIndividual (x1 type (Conference)

value (label "European SW Conf") value (location x2) )

Individual (x2 type (Address)

value (city "Budva") value (country "Montenegro") )

u1 = (" European SW Conf ", "Address")u2 = ("Address", "Budva", "Montenegro")

Naïve Bayes text classifier

kNN text classifier

jTux um

iiNBji SmSCLTSw),(

)Pr()Pr(),,(

Page 15: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Structural and Semantics-Based Classifier

∑ is a set of mapping rules: αij Tj ← Si

∑ sets are computed by taking the OWL definition of the entities to alignrecursively in the OWL structure... without looping thanks to cycles

detection

Page 16: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Structural and Semantics-Based Classifier If Si and Tj are property names:

If Si and Tj are concept names1:

otherwise ),,('

if 0),,(

ji

ij

ji TSw

STTSw

otherwise ),,(max),,('1)Set(

1

and 0D if ),,('

if 0

),,(

),(

t

setDjCijisetji

ijji

ij

ji

DCwTSw

STTSw

ST

TSw

1 Where D = D(Si) * D(Tj) ; D(Si) represents the set of concepts directly parent of Si

Page 17: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Structural and Semantics-Based ClassifierLet CS=(QR.C) and DT=(Q’R’.D), then1:

Let CS=(op C1…Cm) and DT=(op’ D1…Dm), then2:

),,(),',()',(),,( DCwRRwQQwDCw QTS

),min(

),,(max

)',(),,(),(

nm

DCw

opopwDCwsetDjCi

jiset

opTS

1 Where Q,Q’ are quantifiers, R,R’ are property names and C,D concept expressions

2 Where op, op’ are concept constructors and n,m ≥ 1

Page 18: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Evaluation

OAEI Contests (2004, 2005, 2006): http://oaei.ontologymatching.org/Systematic benchmark tests on

bibliographic dataTests 2xx: aligning an ontology with

variations of itself where each OWL constructs are discarded or modified one per oneTests 3xx: four real bibliographic ontologies

Web categories alignmenthttp://oaei.ontologymatching.org/2005/

results/

Page 19: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Benchmark Tests

dublin20 0.92

Falcon 0.91

FOAM 0.90

oMAP 0.85

CMS 0.81

OLA 0.80

ctxMatch 0.72

edna 0.45

Falcon 0.89

OLA 0.74

dublin20 0.72

FOAM 0.69

oMAP 0.68

edna 0.61

ctxMatch 0.20

CMS 0.18oMAP:4th with the global F-Measure1st on 3xx tests (real ontologies to align)

Precision Recall

Page 20: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Aligning Web Categories

Aligning Google, Loksmart and Yahoo web categories [Avesani et al., ISWC'05]

Blind tests: only recall results are available

ctxMatch

FOAM CMS Dublin20

Falcon OLA oMAP

9.4% 11.9% 14.1% 26.5% 31.2% 32.0% 34.4%

Page 21: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Distributed Search in the SW

Q: retrieve course material dealing with history of the Americas and "Columbus"query(d)<- History_Americas(d,"Columbus")

is re-written as two queries0.63 query(d)<-

Latin_American_History(d,"Columbus")0.84 query(d)<- American_History(d,"Columbus")

Each document score is then multiplied with the confidence score of the rule.

university courses

Page 22: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Conclusion

Distributed Search in the SWresource selection / query reformulation /

data fusion and rank aggregationoMAP: a formal framework for aligning

automatically OWL ontologiesCombining several specific classifiersTerminological classifiersMachine learning-based classifiersStructural and semantics-based classifier

Page 23: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Future Work

Implementing the three steps proposedKeyword-based or structured (SPARQL) queriesRanked list of results

oMAPUsing additional classifiers:KL-distance, other resources, background K, etc.Straightforward theoretically but practically

difficult!Finding complex alignmentname = firstName + lastName

OWL and rule-based languages:Take into account this additional expressivity

Page 25: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006
Page 26: Towards Distributed Information Retrieval in the Semantic Web: Query Reformulation Using the Framework Raphael.Troncy@cwi.nl Wednesday 14 th of June, 2006

Structural and Semantics-Based ClassifierPossible values for wop and wQ

weights

wop wQ⊓ ⊔ ¬

⊓ 1 1/4 0

⊔ 1 0

¬ 1

1 1/4

1

n n m

1 1/3

m

1