ir models based on predicate logic · ir and databases the logic view retrieval idb: given query q,...

25
IR Models based on Predicate Logic Norbert Fuhr 1 / 97 The logical view on IR IR as Inference IR as uncertain inference Propositional vs. Predicate Logic The logical view on IR IR as inference q - query d – document retrieval: search for documents which imply the query: d q Example: classical IR: d = {t 1 , t 2 , t 3 } q = {t 1 , t 3 } retrieval: q d ? logical view: d = t 1 t 2 t 3 q = t 1 t 3 retrieval: d q ? 3 / 97 advantage of inference-based approach: step from term-based to knowledge-based retrieval e.g. easy incorporation of additional knowledge example: d : ’squares’ q: ’rectangles’ thesaurus: ’squares’ ’rectangles’ : d q 4 / 97

Upload: others

Post on 01-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

IR Models based on Predicate Logic

Norbert Fuhr

1 / 97

The logical view on IR

IR as InferenceIR as uncertain inferencePropositional vs. Predicate Logic

The logical view on IRIR as inference

q - queryd – documentretrieval:search for documents which imply the query: d → q

Example: classical IR:d = {t1, t2, t3}q = {t1, t3}retrieval: q ⊂ d ?

logical view:d = t1 ∧ t2 ∧ t3

q = t1 ∧ t3

retrieval: d → q ?

3 / 97

advantage of inference-based approach:step from term-based to knowledge-based retrievale.g. easy incorporation of additional knowledgeexample:d : ’squares’q: ’rectangles’thesaurus: ’squares’ → ’rectangles’⇒: d → q

4 / 97

Page 2: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

IR as uncertain inference

d : ’quadrangles’q: ’rectangles’⇒ uncertain knowledge required

’quadrangles’0.3→ ’rectangles’

[Rijsbergen 86]:IR as uncertain inferenceRetrieval =estimate probability P(d → q) = P(q|d)

t1 t4

t2 t5

t3 t6d

q

5 / 97

Limitations of propositional logic:

conventional indexing (based on propositional logic): d = {tree,house}query: Is there a picture with a tree on the left of the house?⇒ query cannot be expressed in propositional logic

predicate logic:

d: tree(t1). house(h1). left(h1,t1).

?- tree(X) & house(Y) & left(X,Y).

6 / 97

Relational Structures: Datalog

Datalog program: finite set of rules each expressing a conjunctivequery

t(X1, ...,Xk) : −r1(U11, . . . ,U1m1), . . . , rn(Un1, . . . ,Unmn)

where each variable Xi occurs in the body of the rule (this way,every rule is safe). This corresponds to the logical formula

∀X1 . . . ∀Xk∀U11 . . . ∀Unmn

t(X1, ...,Xk) ∧ ¬r1(U11, . . . ,U1m1) ∧ . . . ∧ ¬rn(Un1, . . . ,Unmn)

t(X1, ..., xk) is called the head andr1(U11, . . . ,U1m1), ..., rn(Un1, . . . ,Unmn) the body.

A formula without a head is also called a fact

7 / 97

Datalog Properties

I horn predicate logic

I no functions

I restricted forms of negation allowed

t(X1, ...,Xk) : −r1(U11, . . . ,U1m1), ...,¬rn(Un1, . . . ,Unmn)

I rules may be recursive (head predicate may occur in the body)

r(X ,Y ) : −l(X ,Z ), r(Z ,Y )

I sound and complete evaluation algorithms

8 / 97

Page 3: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

IR and DatabasesThe Logic View

Retrieval

I DB: given query q, find objects o with o → q

I IR: given query q, find documents d with high values ofP(d → q)

I DB is a special case of IR!(in a certain sense)

This section: Focusing on the logic view

I Inference

I Vague predicates

I Query language expressiveness

9 / 97

Inference

IR with the Relational ModelThe Probabilistic Relational ModelInterpretation of probabilistic weightsExtensions

Disjoint eventsRelational BayesProbabilistic rules

Relational ModelProjection

indexDOCNO TERM

1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop

topic

irdboopai

Projection: what is the collection about?topic(T) :- index(D,T).

11 / 97

Relational ModelSelection

indexDOCNO TERM

1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop

aboutir

124

Selection: which documents are about IR?aboutir(D) :- index(D,ir).

12 / 97

Page 4: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Relational ModelJoin

indexDOCNO TERM

1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop

authorDOCNO NAME

1 smith2 miller3 johnson4 firefly4 bradford5 bates

irauthor

smithmillerfireflybradford

Join: who writes about IR?irauthor(A):- index(D,ir) & author(D,A).

13 / 97

Relational ModelUnion

indexDOCNO TERM

1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop

irordb

12345

Union: which documents are about IR or DB?irordb(D) :- index(D,ir).

irordb(D) :- index(D,db).

14 / 97

Relational ModelDifference

indexDOCNO TERM

1 ir1 db2 ir3 db3 oop4 ir4 ai5 db5 oop

irnotdb

24

Difference: which documents are about IR, but not DB?irnotdb(D) :- index(D,ir) & not(index(D,db)).

15 / 97

The Probabilistic Relational Model

[Fuhr & Roelleke 97] [Suciu et al 11]

indexβ DOCNO TERM

0.8 1 IR0.7 1 DB0.6 2 IR0.5 3 DB0.8 3 OOP0.9 4 IR0.4 4 AI0.8 5 DB0.3 5 OOP

aboutdb0.7 10.5 30.8 5

aboutirdb0.8*0.7 1

Which documents are about DB?aboutdb(D) :- index(D,db).

Which documents are about IR and DB?aboutirdb(D) :- index(D,ir) & index(D,db).

16 / 97

Page 5: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Extensional vs. intensional semanticsdoctermβ DOC TERM

0.9 d1 ir0.5 d1 db

linkβ S T

0.7 d2 d1

about(D,T) :- docTerm(D,T).

about(D,T) :- link(D,D1) & about(D1,T)

q(D) :- about(D,ir) & about(D,db).

Extensional semantics:weight of derived fact as function of weights of subgoalsP(q(d2)) = P(about(d2,ir)) · P(about(d2,db)) =

(0.7 · 0.9) · (0.7 · 0.5)

Problem“improper treatment of correlated sources of evidence” [Pearl 88]→ extensional semantics only correct for tree-shaped inferencestructures

17 / 97

Intensional semantics

weight of derived fact as function of weights of underlying groundfacts

Method: Event keys and event expressions

doctermβ κ DOC TERM

0.9 dT(d1,ir) d1 ir0.5 dT(d1,db) d1 db

linkβ κ S T

0.7 l(d2,d1) d2 d1

?- docTerm(D,ir) & docTerm(D,db).

givesd1 [dT(d1,ir) & dT(d1,db)] 0.9 · 0.5 = 0.45

18 / 97

Event keys and event expressions

doctermβ κ DOC TERM

0.9 dT(d1,ir) d1 ir0.5 dT(d1,db) d1 db

linkβ κ S T

0.7 l(d2,d1) d2 d1

about(D,T) :- docTerm(D,T).

about(D,T) :- link(D,D1) & about(D1,T)

?- about(D,ir) & about(D,db).

givesd1 [dT(d1,ir) & dT(d1,db)] 0.9 · 0.5 = 0.45d2 [l(d2,d1) & dT(d1,ir) & l(d2,d1) & dT(d1,db)]

0.7 · 0.9 · 0.5 = 0.315

19 / 97

Recursion

about(D,T) :- docTerm(D,T).

about(D,T) :- link(D,D1) & about(D1,T).

d3

docterm

linkd1 d2

0.5

0.40.8

0.9

0.5

ir

db

?- about(D,ir)

d1 [dT(d1,ir) | l(d1,d2) & l(d2,d3) & l(d3,d1) &

dT(d1,ir) | ...] 0.900d3 [l(d3,d1) & dT(d1,ir)] 0.720d2 [l(d2,d3) & l(d3,d1) & dT(d1,ir)] 0.288

?- about(D,ir) & about(D,db)

d1 [dT(d1,ir) & dT(d1,db)] 0.450d3 [l(d3,d1) & dT(d1,ir) & l(d3,d1) & dT(d1,db)] 0.360

d2 [l(d2,d3) & l(d3,d1) & dT(d1,ir) & dT(d1,db)] 0.14420 / 97

Page 6: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Computation of probabilities for event expressions

1. transformation of expression into disjunctive normal form

2. application of sieve formula:I simple case of 2 conjuncts: P(a∨ b) = P(a) + P(b)−P(a∧ b)I general case:

ci – conjunct of event keys

P(c1 ∨ . . . ∨ cn) =n∑

i=1

(−1)i−1∑

1≤j1<...<ji≤n

P(cj1 ∧ . . . ∧ cji ).

I exponential complexity

I use only when necessary for correctness

I see [Dalvi & Suciu 07]

21 / 97

Possible worlds semantics

0.9 docTerm(d1,ir).

P(W1) = 0.9: {docTerm(d1,ir)}P(W2) = 0.1: {}

22 / 97

0.6 docTerm(d1,ir). 0.5 docTerm(d1,db).

Possible interpretations:

I1: P(W1) = 0.3: {docTerm(d1,ir)}P(W2) = 0.3: {docTerm(d1,ir), docTerm(d1,db)}P(W3) = 0.2: {docTerm(d1,db)}P(W4) = 0.2: {}

I2: P(W1) = 0.5: {docTerm(d1,ir)}P(W2) = 0.1: {docTerm(d1,ir), docTerm(d1,db)}P(W3) = 0.4: {docTerm(d1,db)}

I3: P(W1) = 0.1: {docTerm(d1,ir)}P(W2) = 0.5: {docTerm(d1,ir), docTerm(d1,db)}P(W3) = 0.4: {}

probabilistic logic:0.1 ≤ P(docTerm(d1, ir)&docTerm(d1, db)) ≤ 0.5probabilistic Datalog with independence assumptions:P(docTerm(d1, ir)&docTerm(d1, db)) = 0.3

23 / 97

Disjoint events

β City State

0.7 Paris France0.2 Paris Texas0.1 Paris Idaho

Interpretation:P(W1) = 0.7: {cityState(paris, france)}P(W2) = 0.2: {cityState(paris, texas)}P(W3) = 0.1: {cityState(paris, idaho)}

24 / 97

Page 7: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Relational Bayes

[Roelleke et al. 07]

Role of the relational Bayes: Generation of a probabilistic database

database

Non−probabilistic Bayes

database

Probabilistic

25 / 97

Relational BayesExample: P(Nationality | City)

nationality and cityNationality City

”British” ”London””British” ”London””British” ”London””Scottish” ”London””French” ”London””German” ”Hamburg””German” ”Hamburg””Danish” ”Hamburg””British” ”Hamburg””German” ”Dortmund””German” ”Dortmund””Turkish” ”Dortmund””Scottish” ”Glasgow”

=⇒Bayes[$City]()

nationality cityP(Nationality|City) Nationality City

0.600 ”British” ”London”0.200 ”Scottish” ”London”0.200 ”French” ”London”0.500 ”German” ”Hamburg”0.250 ”Danish” ”Hamburg”0.250 ”British” ”Hamburg”0.667 ”German” ”Dortmund”0.333 ”Turkish” ”Dortmund”1.000 ”Scottish” ”Glasgow”

1 # P(Nationality | City) :2 nationality city SUM(Nat, City) :−3 nationality and city (Nat, City) | (City) ;

26 / 97

Relational BayesExample: P(t|d)

termTerm DocId

sailing doc1boats doc1sailing doc2boats doc2sailing doc2east doc3coast doc3sailing doc3sailing doc4boats doc5

p t d space(Term, DocId) :-term(Term, DocId) | (DocId);

P(t|d) Term DocId

0.50 sailing doc10.50 boats doc10.33 sailing doc20.33 boats doc20.33 sailing doc20.33 east doc30.33 coast doc30.33 sailing doc31.00 sailing doc41.00 boats doc5

p t d SUM(Term, DocId) :-term(Term, DocId) | (DocId);

P(t|d) Term DocId

0.50 sailing doc10.50 boats doc10.67 sailing doc20.33 boats doc20.33 east doc30.33 coast doc30.33 sailing doc31.00 sailing doc41.00 boats doc5

27 / 97

Probabilistic rulesRules for deterministic facts:

0.7 likes-sports(X) :- man(X).

0.4 likes-sports(X) :- woman(X).

man(peter).

Interpretation:P(W1) = 0.7: {man(peter), likes-sports(peter)}P(W2) = 0.3: {man(peter)}

28 / 97

Page 8: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Probabilistic rulesRules for uncertain facts:

# gender is disjoint on the first attribute

0.7 l-sports(X) :- gender(X,male).

0.4 l-sports(X) :- gender(X,female).

0.5 gender(X,male) :- human(X).

0.5 gender(X,female) :- human(X).

human(jo).

Interpretation:P(W1) = 0.35: {gender(jo,male), l-sports(jo)}P(W2) = 0.15: {gender(jo,male)}P(W3) = 0.20: {gender(jo,female), l-sports(jo)}P(W4) = 0.30: {gender(jo,female)}

?- l-sports(jo) P(W1) + P(W3) = 0.55

29 / 97

Probabilistic rulesRules for independent events

sameauthor(D1,D2) :- author(D1,X) & author(D2,X).

0.5 link(D1,D2) :- refer(D1,D2).

0.2 link(D1,D2) :- sameauthor(D1,D2).

?? link(D1,D2) :- refer(D1,D2) & sameauthor(D1,D2).

P(l |r), P(l |s) → P(l |r ∧ s)?

30 / 97

Rules for independent eventsModeling probabilistic inference networks

0.7 link(D1,D2) :- refer(D1,D2) & sameauthor(D1,D2).

0.5 link(D1,D2) :- refer(D1,D2) & not(sameauthor(D1,D2)).

0.2 link(D1,D2) :- sameauthor(D1,D2) & not(refer(D1,D2)).

Probabilistic inference networks,rules define link matrix

refer sameauthor

link

31 / 97

Vague Predicates

The Logical View on Vague PredicatesVague Predicates in IR and DatabasesProbabilistic Modeling of Vague Predicates

Page 9: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Vague PredicatesMotivating Example

33 / 97

Propositional vs. Predicate Logic

I Current IR systems are based on proposition logic(query term present/absent in document)

I Similarity of values not considered

I but multimedia IR deals with similarity already

I transition from propositional to predicate logic necessary

34 / 97

Vague Predicates in Probabilistic Datalog

[Fuhr & Roelleke 97] [Fuhr 00]

I Example: Shopping 45 inch LCD TV

I vague predicates as builtin predicates:X ≈ Y

I query(D):- Category(D,tv) &

type(D,lcd) & size(D,X) &

≈(X,45)

X ≈ Y

β X Y

0.7 42 450.8 43 450.9 44 451.0 45 450.9 46 450.8 47 45. . . . . . . . .

35 / 97

Data types and vague predicates in IR

Data type: domain + (vague) predicates

I Language (multilingual documents) /(language-specific stemming)

I Person names / “his name sounds like Jones”

I Dates / “about a month ago”

I Amounts / “orders exceeding 1 Mio $”

I Technical measurements / “at room temperature”

I Chemical formulas

36 / 97

Page 10: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Vague Criteria in Fact Databases

”I am looking for a 45-inch LCD TV with

I wide viewing angle

I high contrast

I low price

I high user rating”

→ vague criteria are very frequent in end-user querying of factdatabases

→ but no appropriate support in SQL

vague conditions → similar to fuzzy predicates

37 / 97

Probabilistic Modeling of Vague Predicates

[Fuhr 90]

I learn vague predicates fromfeedback data

I construct feature vector~x(qi , di ) from query value qi

and document value di

(e.g. relative difference)

I apply logistic regression

38 / 97

Expressiveness

Retrieval Rules, Joins, Aggregations and RestructuringExpressiveness in XML Retrieval

ExpressivenessFormulating Retrieval Rules

about(D,T) :- docTerm(D,T).

consider document linking / anchor textabout(D,T) :- link(D1,D),about(D1,T).

consider term hierarchyabout(D,T) :- subconcept(T,T1) & about(D,T1).

field-specific term weighting0.9 docTerm(D,T) :- occurs(D,T,title).

0.5 docTerm(D,T) :- occurs(D,T,body).

40 / 97

Page 11: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

ExpressivenessJoins

IR authors:

irauthor(N):- about(D,ir) & author(D,N).

Smith’s IR papers cited by Miller

?- author(D,smith) & about(D,ir) &

author(D1,miller) & cites(D,D1).

41 / 97

ExpressivenessAggregation (1)

Who are the major IR authors?

indexβ DNO TERM

0.9 1 ir0.8 1 db0.6 2 ir0.8 3 ir0.7 3 ai

authorDNO NAME

1 smith2 miller3 smith

irauthor0.98 smith0.6 miller

irauthor(A):- index(D,ir) & author(D,A).

Aggregation through projection!

42 / 97

ExpressivenessAggregation (2)

Who are the major IR authors?

indexβ DNO TERM

0.9 1 ir0.8 1 db0.6 2 ir0.8 3 ir0.7 3 ai

authorDNO NAME

1 smith2 miller3 smith

irauths1.7 smith0.6 miller

Aggregation through summing:

irauth(D,A):- index(D,ir) & author(D,A).

irauths SUM(Name) :- irdbauth(Doc,Name) | (Name)

43 / 97

Expressiveness in XML Retrieval

[Fuhr & Lalmas 07]

44 / 97

Page 12: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

XML structure: 1. Nested Structure

I XML document as hierarchicalstructure

I Retrieval of elements (subtrees)

I Typical query language does notallow for specification of structuralconstraints

I Relevance-oriented selection ofanswer elements: return the mostspecific relevant elements

45 / 97

XML structure: 2. Named Fields

I Reference to elementsthrough field names only

I Context of elements isignored(e.g. author of article vs.author of referenced paper)

I Post-Coordination may leadto false hits(e.g. author name – authoraffiliation)

Example: Dublin Core<oai dc:dc xmlns:dc=

"http://purl.org/dc/elements/1.1/">

<dc:title>Generic Algebras

... </dc:title>

<dc:creator>A. Smith (ESI),

B. Miller (CMU)</dc:creator>

<dc:subject>Orthogonal group,

Symplectic group</dc:subject>

<dc:date>2001-02-27</dc:date>

<dc:format>application/postscript</dc:format>

<dc:identifier>ftp://ftp.esi.ac.at/pub/esi1001.ps</dc:identifier>

<dc:source>ESI preprints

</dc:source>

<dc:language>en</dc:language>

</oai dc:dc>

46 / 97

XML structure: 3. XPath

/document/chapter[about(./heading, XML) AND

about(./section//*,syntax)]

47 / 97

XML structure: 3. XPath (cont’d)

I Full expressiveness for navigation through document tree(+links)

I Parent/child, ancestor/descendantI Following/preceding, following-sibling, preceding-siblingI Attribute, namespace

I Selection of arbitrary elements/subtrees(but answer can be only a single element of the originatingdocument)

48 / 97

Page 13: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

XML structure: 4. XQuery

Higher expressiveness, especially for database-like applications:

I Joins (trees → graphs)

I Aggregations

I Constructors for restructuring results

Example: List each publisher and the average price of its books

FOR $p IN distinct(document("bib.xml")//publisher)

LET $a := avg(document("bib.xml")//book[publisher =

$p]/price)

RETURN

<publisher>

<name> $p/text() </name>

<avgprice> $a </avgprice>

</publisher>

49 / 97

XML content typing

50 / 97

XML content typing: 1. Text<book>

<author>John Smith</author>

<title>XML Retrieval</title>

<chapter> <heading>Introduction</heading>

This text explains all about XML and IR.

</chapter>

<chapter>

<heading> XML Query Language XQL

</heading>

<section>

<heading>Examples</heading>

</section>

<section>

<heading>Syntax</heading>

Now we describe the XQL syntax.

</section>

</chapter>

</book>

Example query

//chapter[about(.,

XML query language]

51 / 97

XML content typing: 2. Data Types

I Data type: domain + (vague) predicates(see above)

I Close relationship to XML Schema, butI XMLS supports syntactic type checking onlyI No support for vague predicates

52 / 97

Page 14: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

XML content typing: 3. Object TypesBased on Tagging / Named Entity Recognition

I Object types: Persons, Locations. Dates, .....

Pablo Picasso (October 25, 1881 - April 8, 1973) was aSpanish painter and sculptor..... In Paris, Picasso entertaineda distinguished coterie of friends in the Montmartre andMontparnasse quarters, including Andre Breton, GuillaumeApollinaire, and writer Gertrude Stein.

To which other artists did Picasso have close relationships?Did he ever visit the USA?

I Named entity recognition methods allow for automaticmarkup of object types

I Object types support increased precision

53 / 97

XML content typingTag semantics modelled as hierarchies

54 / 97

XML content typingTag semantics modelled in OWL

55 / 97

Retrieval of structured documents: POOL

Structure of POOL programsContexts and augmentationAugmentation and inconsistencies

Page 15: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Goals

I retrieval of structured documents→ hierarchical logical structure

I abstraction from node types→ contexts as untyped nodes

I multimedia retrieval→ expressiveness of restricted predicate logic

57 / 97

Structure of POOL programs

object: identifier + content

context: object with nonempty contenta1[ ], s11[ ], s12[ ]

program: set of clauses

clause: context / proposition / rule

proposition:

I term(image, presentation)

I classification(article(a1), section(s11))

I attribute(s11.author(smith), a1.pubyear(1997))

58 / 97

POOL Example Program

a1[ s11[ image 0.6 retrieval presentation ]

s12[ ss121[ audio indexing ]

ss122[ video not presentation ] ] ]

s11.author(smith)

s121.author(miller) s122.author(jones)

a1.pubyear(1997)

article(a1) section(s11) section(s12)

subsection(ss121) subsection(ss122)

59 / 97

Rules in POOL

rule: head :- body

head: proposition / context containing a proposition

body conjunction of subgoals (propositions or contexts)

docnode(D) :- article(D)

docnode(D) :- section(D)

docnode(D) :- subsection(D)

mm-ir-doc(D) :- docnode(D) &

D[audio & retrieval]

german-paper(D) :- D.author.country(germany)

query: ?- body

?- D[audio & indexing]

60 / 97

Page 16: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Contexts and augmentation

clauses only hold for context where stated

augmentation:propagation of propositions to surrounding contexts

a1[ s11[ image 0.6 retrieval presentation ]

s12[ ss121[ audio indexing ]

ss122[ video not presentation ] ] ]

?- D[audio & video]

s12

a1

61 / 97

Augmentation with Uncertainty:

?- audio

1.00 ss121

0.60 s12

0.36 a1

?- D[audio & video]

0.36 s12

0.22 a1

augmentation with uncertainty prefers most specific context!

62 / 97

Augmentation and Inconsistencies

d1[ s1[ audio indexing ]

s2[ s21[ image retrieval]

s22[ video not retrieval ] ] ]

?- D[audio & indexing]

s1

d1

?- D[video & image]

s2

d1

?- D[video & retrieval]

retrieval is inconsistent in s2

63 / 97

Four-Valued Logic

truth values: unknown, true, false, inconsistent

∧ T F U I

T T F U IF F F F FU U F U FI I F F I

∨ T F U I

T T T T TF T F U IU T U U TI T I T I

¬T FF TU UI I

Interpretation as sets over two values:T={true}F={false}U={true,false}I ={}

64 / 97

Page 17: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Applying Four-Valued Logic

d1[ s1[ audio indexing ]

s2[ s21[ image retrieval]

s22[ video not retrieval ] ] ]

s22:video 7→ trueimage 7→ unknownretrieval 7→ falses2:image 7→ truevideo 7→ trueretrieval 7→ inconsistent

65 / 97

Description logic

ThesaurusIntroduction into OWLSPARQL

regular

triangle

regular

polygon

polygon

rectangle

square

triangle quadrangle ...

thesaurus knowledge:can be expressed in propositional logicsquare = quadrangle ∧ regular-polygon

description logic

I based on semantic networksI more expressive than thesauri

I instances of conceptsI roles between (instances of) concepts

67 / 97

Semantic Web (ontology) languages

RDF: “Resource description language”semantic markup language, only resources and theirproperties, serialisation in XML

RDFS: “RDF Schema”, schema definition language for RDF

OWL: extends RDF/RDFS by richer modelling primitives,OWL Lite/DL/Full

I OWL Lite contains simple primitivesI OWL DL corresponds to expressive description

logicI OWL Full is OWL DL + RDF

knowledge base can be modelled as collection of RDFtriples (RDF/XML serialisation)alternative encoding: abstract syntax (easier to read)

68 / 97

Page 18: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Objects, classes, literals and datatypes

I Two distinct domains:

Classes: for objectsData types: for literals

Person

owl:Class owl:Datatype

Peter 1,89

rdf:type rdf:type

rdf:type rdf:type

xsd:decimal

69 / 97

Classes (1)

owl:Class

ManMale

Animal Person

Femaleowl:disjointWith

rdfs:subClassOfrdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

rdf:type

Class(Female partial Animal)

<owl:Class rdf:ID="Female">

<rdfs:subClassOf rdf:resource="#Animal"/>

</owl:Class>

70 / 97

Classes (2)

owl:Class

ManMale

Animal Person

Femaleowl:disjointWith

rdfs:subClassOfrdfs:subClassOf

rdfs:subClassOf

rdfs:subClassOf

rdf:type

Class(Male partial Animal)

DisjointClasses(Male Female)

<owl:Class rdf:ID="Male">

<rdfs:subClassOf rdf:resource="#Animal"/>

<owl:disjointWith rdf:resource="#Female"/>

</owl:Class>

71 / 97

Object properties (1)

owl:ObjectProperty

Animal Animalrdfs:domain

rdfs:rangehasFather Male

rdfs:rangehasParent

rdf:type

owl:subPropertyOf

ObjectProperty(hasParent domain(Animal) range(Animal))

<owl:ObjectProperty rdf:ID="hasParent">

<rdfs:domain rdf:resource="#Animal"/>

<rdfs:range rdf:resource="#Animal"/>

</owl:Class>

72 / 97

Page 19: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Object properties (2)

owl:ObjectProperty

Animal Animalrdfs:domain

rdfs:rangehasFather Male

rdfs:rangehasParent

rdf:type

owl:subPropertyOf

ObjectProperty(hasFather super(hasParent) range(Male))

<owl:ObjectProperty rdf:ID="hasFather">

<rdfs:subPropertyOf rdf:resource="#hasParent"/>

<rdfs:range rdf:resource="#Male"/>

</owl:Class>

73 / 97

Datatype properties

rdfs:domain rdfs:range

rdf:type

owl:DatatypeProperty

shoesizePerson xsd:decimal

rdf:type

owl:FunctionalProperty

DatatypeProperty(shoesize Functional domain(Animal) range(xsd:decimal))

<owl:DatatypeProperty rdf:ID="shoesize">

<rdfs:domain rdf:resource="#Animal"/>

<rdfs:range rdf:resource="xsd:decimal"/>

<rdf:type rdf:resource="owl:FunctionalProperty"/>

</owl:Class>

74 / 97

Property restrictions

Class(Person partial Animal restriction(hasParent allValuesFrom(Person))

restriction(hasParent cardinality(2)))

<owl:Class rdf:ID="Person">

<rdfs:subClassOf rdf:resource="#Animal"/>

<rdfs:subClassOf>

<owl:Restriction>

<owl:onProperty rdf:resource="#hasParent"/>

<owl:allValuesFrom rdf:resource="#Person"/>

</owl:Restriction>

</rdfs:subClassOf>

<rdfs:subClassOf>

<owl:Restriction>

<owl:onProperty rdf:resource="#hasParent"/>

<owl:cardinality>2</owl:cardinality>

</owl:Restriction>

</rdfs:subClassOf>

</owl:Class>

75 / 97

Instances

Individual(Kain type(Male) value(hasFather Adam)

value(hasMother Eve)

value(shoesize 10))

<Male rdf:ID="Kain">

<hasFather rdf:resource="#Adam"/>

<hasMother rdf:resource="#Eve"/>

<shoesize>10</shoesize>

</Male>

76 / 97

Page 20: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Further modelling primitives

owl:inverseOf: inverse property: p(a, b)↔ r(b, a)

owl:TransitiveProperty: p(a, b), p(b, c)→ p(a, c)

owl:SymmetricProperty: p(a, b)→ p(b, a)

owl:InverseFunctionalProperty: inverse property is functional

owl:hasValue at least one property value equals object or datatypevalue

owl:someValuesFrom at least one property value is instance ofclass, expression or datatype

owl:interSectionOf, owl:unionOf, owl:complementOf: booleancombinations of class expressions

owl:oneOf: define class by enumerating its instances

77 / 97

OWL Class Constructors

78 / 97

OWL Axioms

79 / 97

OWL DL Semantics

80 / 97

Page 21: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Limitations of OWL

OWL lacks support for

I uncertainty: only deterministic relationships possible,no weighting or probabilistic facts⇒ “Pr(hasFather(lisa,thomas))=0.9” cannot be expressed

I rules: no general rules,only specific rules like subClassOf, TransitiveProperty . . .⇒ “if hasParent(A,B) and hasParent(C,D) andhasSibling(B,D), then hasCousion(A,C)” cannot be expressed

I n-ary datatype predicates:OWL datatypes are based on XML Schema datatypes, thusproviding only unary datatype predicates⇒ “sameDomain([email protected],[email protected])” cannot beexpressed

⇒ IR queries cannot be expressed directly in OWL

81 / 97

OWL: Conclusion

I OWL extends RDF(S) by additional modelling primitives

I well-defined semantics, based on description logics

I does not support all RDF features (no reification, only threelevels owl:Class, classes and objects)

I lacks important features:I only deterministic features, no probabilistic relationshipsI no rules (but in SWRL)I restricted datatype predicates (due to XML Schema)

I OWL and associated languages become standard in theSemantic Web

82 / 97

Semantic Web Layers

83 / 97

SPARQL

query language for getting information from RDF (OWL) graphsFacilities for

I extract information in the form of URIs, blank nodes, plainand typed literals

I extract RDF subgraphs

I construct new RDF graphs based on information in thequeried graphs

Features:

I matching graph patterns

I variables – global scope; indicated by ’?’ or ‘$‘

84 / 97

Page 22: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

SPARQL: Basic Graph Pattern

I Set of Triple PatternsI Triple Pattern – similar to an RDF Triple (subject, predicate,

object), but any component can be a query variable; literalsubjects are allowed?book dc:title ?title

I Matching a triple pattern to a graph: bindings betweenvariables and RDF Terms

I Matching of Basic Graph PatternsI A Pattern Solution of Graph Pattern GP on graph G is any

substitution S such that S(GP) is a subgraph of G.SELECT ?x ?v WHERE ?x ?v ?x

85 / 97

SPARQL: Group Patterns + Value Constraints

Group Pattern: A set of graph patterns which must all matchValue Constraints: restrict RDF terms in a solutionSELECT ?n WHERE

?n profession "Physicist" . ?n isa "Politician"

86 / 97

SPARQL: Query forms

SELECT returns all, or a subset of the variables bound in aquery pattern matchformats : XML or RDF/XML

CONSTRUCT returns an RDF graph constructed by substitutingvariables in a set of triple templates

DESCRIBE returns an RDF graph that describes the resourcesfound.

ASK returns whether a query pattern matches or not.

87 / 97

Conclusion and Outlook

Page 23: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Conclusion

Inference

I Probabilistic relational model supports integration of IR+DB

I Probabilistic Datalog as powerful inference mechanism

I Allows for formulating retrieval strategies as logical rules

Vague predicates

I Natural extension of IR methods to attribute values

I Vague predicates can be learned from feedback data

I Transition from propositional to predicate logic

Expressive query language

I Joins

I Aggregations

I (Re)structuring of results89 / 97

http://www.eecs.qmul.ac.uk/~thor/

http://www.spinque.com/

OutlookIR Systems vs. DBMS

Separation between IRS and IR application?

92 / 97

Page 24: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

Towards an IRMS

93 / 97

References I

Dalvi, N. N.; Suciu, D.(2007).Efficient query evaluation on probabilistic databases.VLDB J. 16(4), pages 523–544.

Forst, J. F.; Tombros, A.; Roelleke, T.(2007).POLIS: A Probabilistic Logic for Document Summarisation.In: Proceedings of the 1st International Conference on Theory of InformationRetrieval (ICTIR 07) - Studies in Theory of Information Retrieval, pages201–212.

Frommholz, I.; Fuhr, N.(2006).Probabilistic, Object-oriented Logics for Annotation-based Retrieval in DigitalLibraries.In: Nelson, M.; Marshall, C.; Marchionini, G. (eds.): Opening InformationHorizons – Proc. of the 6th ACM/IEEE Joint Conference on Digital Libraries(JCDL 2006), pages 55–64. ACM, New York.

94 / 97

References II

Fuhr, N.; Lalmas, M.(2007).Advances in XML retrieval: the INEX initiative.In: IWRIDL ’06: Proceedings of the 2006 international workshop on Researchissues in digital libraries, pages 1–6. ACM, New York, NY, USA.

Fuhr, N.; Rolleke, T.(1997).A Probabilistic Relational Algebra for the Integration of Information Retrievaland Database Systems.ACM Transactions on Information Systems 14(1), pages 32–66.

Fuhr, N.; Rolleke, T.(1998).HySpirit – a Probabilistic Inference Engine for Hypermedia Retrieval in LargeDatabases.In: Proceedings of the 6th International Conference on Extending DatabaseTechnology (EDBT), pages 24–38. Springer, Heidelberg et al.

95 / 97

References III

Fuhr, N.(1990).A Probabilistic Framework for Vague Queries and Imprecise Information inDatabases.In: Proceedings of the 16th International Conference on Very Large Databases,pages 696–707. Morgan Kaufman, Los Altos, California.

Fuhr, N.(2000).Probabilistic Datalog: Implementing Logical Information Retrieval for AdvancedApplications.Journal of the American Society for Information Science 51(2), pages 95–110.

Lalmas, M.; Roelleke, T.; Fuhr, N.(2002).Intelligent Hypermedia Retrieval.In: Szczepaniak, P. S.; Segovia, F.; Zadeh, L. A. (eds.): Intelligent Explorationof the Web, pages 324–344. Springer, Heidelberg et al.

Pearl, J.(1988).Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference.Morgan Kaufman, San Mateo, California.

96 / 97

Page 25: IR Models based on Predicate Logic · IR and Databases The Logic View Retrieval IDB: given query q, nd objects o with o !q IIR: given query q, nd documents d with high values of P(d

References IV

Rolleke, T.; Wu, H.; Wang, J.; Azzam, H.(2007).Modelling retrieval models in a probabilistic relational algebra with a newoperator: the relational Bayes.The International Journal on Very Large Data Bases (VLDB) 17(1), pages 5–37.

Suciu, D.; Olteanu, D.; Re, C.; Koch, C.(2011).Probabilistic Databases.Morgan & Claypool Publishers.

97 / 97