ir models j. h. wang mar. 11, 2008. the retrieval process user interface text operations query...

58
IR Models J. H. Wang Mar. 11, 2008

Upload: karin-fletcher

Post on 02-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

IR Models

J. H. WangMar. 11, 2008

Page 2: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

The Retrieval ProcessUserInterface

Text Operations

Query Operations

Indexing

Searching

Ranking

Index

Text

query

user need

user feedback

ranked docs

retrieved docs

logical viewlogical view

inverted file

DB Manager Module

4, 10

6, 7

5 8

2

8

Text Database

Text

Page 3: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Introduction

• Traditional information retrieval systems usually adopt index terms to index and retrieve documents– An index term is a keyword (or group of related

words) which has some meaning of its own (usually a noun)

• Advantages– Simple– The semantic of the documents and of the user

information need can be naturally expressed through sets of index terms

Page 4: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Docs

Information Need

Index Terms

doc

query

Rankingmatch

Page 5: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

IR Models

Ranking algorithms are at the core of information retrieval systems (predicting which documents are relevant and which are not).

Page 6: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

A Taxonomy of Information Retrieval Models

Retrieval:Ad hoc

Filtering

Classic Models

Browsing

USER

TASK

BooleanVector

Probabilistic

Structured Models

Non-overlapping listsProximal Nodes

FlatStructure Guided

Hypertext

Browsing

FuzzyExtended Boolean

Set Theoretic

AlgebraicGeneralized VectorLat. Semantic Index

Neural Networks

Inference NetworkBelief Network

Probabilistic

Page 7: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Structure Guided Hypertext

FlatHypertext

FlatBrowsing

StructuredClassicSet TheoreticAlgebraicProbabilistic

ClassicSet TheoreticAlgebraicProbabilistic

Retrieval

Full Text+Structure

Full TextIndex Terms

Figure 2.2 Retrieval models most frequently associated with distinct combinations of a document logical view and a user task.

Page 8: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Retrieval : Ad hoc and Filtering

• Ad hoc (Search): The documents in the collection remain relatively static while new queries are submitted to the system

• Routing (Filtering): The queries remain relatively static while new documents come into the system

Page 9: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Retrieval: Ad Hoc x Filtering

• Ad hoc retrieval:

Collection“Fixed Size”

Q2

Q3

Q1

Q4Q5

Page 10: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Retrieval: Ad Hoc x Filtering

• Filtering:

Documents Stream

User 1Profile

User 2Profile

Docs Filteredfor User 2

Docs forUser 1

Page 11: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

A Formal Characterization of IR Models

• D : A set composed of logical views (or representation) for the documents in the collection

• Q : A set composed of logical views (or representation) for the user information needs (queries)

• F : A framework for modeling document representations, queries, and their relationships

• R(qi, dj) : A ranking function which defines an ordering among the documents with regard to the query

Page 12: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Definition

• ki : A generic index term• K : The set of all index terms {k1,…,kt}• wi,j : A weight associated with index term ki of a document dj

• gi: A function returns the weight associated with ki in any t-dimensional vector ( gi(dj)=wi,j )

Page 13: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Classic IR Model

• Basic concepts: Each document is described by a set of representative keywords called index terms

• Assign a numerical weights to distinct relevance between index terms

• Three classic models: Boolean, vector, probabilistic

Page 14: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Boolean Model

• Binary decision criterion– Either relevant or nonrelevant (no partial match)

• Data retrieval model• Advantage

– Clean formalism, simplicity• Disadvantage

– It is not simple to translate an information need into a Boolean expression

– Exact matching may lead to retrieval of too few or too many documents

Page 15: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Example

• Can be represented as a disjunction of conjunctive vectors (in DNF)– Q= qa(qbqc)=(1,1,1) (1,1,0) (1,0,0)

• Formal definition– For the Boolean model, the index term weight

are all binary, i.e. wij {0,1}– A query is a conventional Boolean expression,

which can be transformed to a disjunctive normal form (qcc: conjunctive component)

if (qcc )(ki, wi,j=gi(qcc))dnfq

0

1),( qdsim j

dnfq

(1,1,1)(1,0,0)

(1,1,0)Ka Kb

Kc

Page 16: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Vector Model [Salton, 1968]

• Assign non-binary weights to index terms in queries and in documents => TFxIDF

• Compute the similarity between documents and query => Sim(Dj, Q)

• More precise than Boolean model

Page 17: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

The IR Problem A Clustering Problem

• We think of the documents as a collection C of objects and think of the user query as a specification of a set A of objects

• Intra-cluster similarity– What are the features which better describe

the objects in the set A?

• Inter-cluster similarity– What are the features which better distinguish

the objects in the set A?

Page 18: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

• TF: intra-clustering similarity is quantified by measuring the raw frequency of a term ki inside a document dj– term frequency (the tf factor) provides one mea

sure of how well that term describes the document contents

• IDF: inter-clustering similarity is quantified by measuring the inverse of the frequency of a term ki among the documents in the collection– inverse document frequency (the idf factor)

Idea for TFxIDF

Page 19: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Vector Model (1/4)

• Index terms are assigned positive and non-binary weights

• The index terms in the query are also weighted

• Term weights are used to compute the degree of similarity between documents and the user query

• Then, retrieved documents are sorted in decreasing order

),,,(

),,,(

,,2,1

,,2,1

qtqq

jtjjj

wwwq

wwwd

Page 20: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Vector Model (2/4)

• Degree of similarity

t

i qi

t

i ji

t

i qiji

j

jj

ww

ww

qd

qdqdsim

1

2,1

2,

1 ,,

||||),(

dj

q

Figure 2.4 The cosine of is adoptedas sim(dj,q)

Page 21: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Vector Model (3/4)

• Definition– normalized frequency

– inverse document frequency

– term-weighting schemes

– query-term weights

jll

jiji freq

freqf

,

,, max

ii n

Nidf log

ijiji idffreqw ,,

iqll

qiqi n

N

freq

freqw log)

max

5.05.0(

,

,,

Page 22: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Vector Model (4/4)

• Advantages– Its term-weighting scheme improves retrieval

performance– Its partial matching strategy allows retrieval of

documents that approximate the query conditions– Its cosine ranking formula sorts the documents

according to their degree of similarity to the query

• Disadvantage– The assumption of mutual independence between

index terms

Page 23: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

The Vector Model: Example I

k1 k2 k3 q djd1 1 0 1 2d2 1 0 0 1d3 0 1 1 2d4 1 0 0 1d5 1 1 1 3d6 1 1 0 2d7 0 1 0 1

q 1 1 1

d1

d2

d3d4 d5

d6d7

k1k2

k3

Page 24: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

The Vector Model: Example II

d1

d2

d3d4 d5

d6d7

k1k2

k3

k1 k2 k3 q djd1 1 0 1 4d2 1 0 0 1d3 0 1 1 5d4 1 0 0 1d5 1 1 1 6d6 1 1 0 3d7 0 1 0 2

q 1 2 3

Page 25: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

The Vector Model: Example III

d1

d2

d3d4 d5

d6d7

k1k2

k3

k1 k2 k3 q djd1 2 0 1 5d2 1 0 0 1d3 0 1 3 11d4 2 0 0 2d5 1 2 4 17d6 1 2 0 5d7 0 5 0 10

q 1 2 3

Page 26: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Probabilistic Model (1/6)

• Introduced by Roberston and Sparck Jones, 1976– Binary independence retrieval (BIR) model

• Idea: Given a user query q, and the ideal answer set R of the relevant documents, the problem is to specify the properties for this set– Assumption (probabilistic principle): the probability of releva

nce depends on the query and document representations only; ideal answer set R should maximize the overall probability of relevance

– The probabilistic model tries to estimate the probability that the user will find the document dj relevant with ratio P(dj relevant to q)/P(dj nonrelevant to q)

Page 27: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Probabilistic Model (2/6)

• Definition– All index term weights are all binary i.e., wi,j {0,1}

– Let R be the set of documents known to be relevant to query q

– Let be the complement of R– Let be the probability that the docu

ment dj is relevant to the query q– Let be the probability that the docu

ment dj is nonelevant to query q

R)|( jdRP

)|( jdRP

Page 28: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Probabilistic Model (3/6)

• The similarity sim(dj,q) of the document dj to the query q is defined as the ratio

• Using Bayes’ rule,

– P(R) stands for the probability that a document randomly selected from the entire collection is relevant

– stands for the probability of randomly selecting the document dj from the set R of relevant documents

)|Pr(

)|Pr(),(

j

jj

dR

dRqdsim

)Pr()|Pr(

)Pr()|Pr(),(

RRd

RRdqdsim

j

jj

)|( RdP j

Page 29: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Probabilistic Model (4/6)

• Assuming independence of index terms and given q=(d1, d2, …, dt),

t

iiij

t

iiij

RdkRd

RdkRd

1

1

)|Pr()|Pr(

)|Pr()|Pr(

)Pr(

)Pr(log

)|Pr(

)|Pr(log),(

R

R

Rd

Rdqdsim

j

jj

t

iii

t

iii

j

Rdk

Rdkqdsim

1

1

)|Pr(

)|Pr(log),(

Page 30: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Probabilistic Model (5/6)

– Pr(ki |R) stands for the probability that the index term ki is present in a document randomly selected from the set R

– stands for the probability that the index term ki is not present in a document randomly selected from the set R

)|Pr( Rki

Page 31: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Probabilistic Model (6/6)

1)( 0)(

1)( 0)(

)|Pr()|Pr(

)|Pr()|Pr(),(

ji ji

ji ji

dg dg ii

dg dg ii

jRkRk

RkRkqdsim

1)|Pr()|Pr( RkRk ii

t

i i

i

i

ijiqij

RkP

RkP

RkP

RkPwwqdsim

1,, )|(

)|(1log

)|(1

)|(log),(

t

i i

i

i

ij

RkP

RkP

RkP

RkPqdsim

1 )|(

)|(1log

)|(1

)|(log),(

Page 32: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Estimation of Term Relevance

In the very beginning:

Next, the ranking can be improved as follows:

For small values of V and Vi

N

dfRk

Rk

ii

i

)|Pr(

5.0)|Pr(

VN

VdfRk

V

VRk

iii

ii

)|Pr(

)|Pr(

1

5.0)|Pr(

1

5.0)|Pr(

VN

VdfRk

V

VRk

iii

ii

Let V be a subset of the documents initially retrieved

1)|Pr(

1)|Pr(

VN

VdfRk

V

VRk

VV

iii

VV

ii

i

i

N

Vdfi

Page 33: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

• Advantage– Documents are ranked in decreasing order

of their probability of being relevant• Disadvantage

– The need to guess the initial relevant and nonrelevant sets

– Term frequency is not considered– Independence assumption for index terms

Page 34: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Brief Comparison of Classic Models

• Boolean model is the weakest– Not able to recognize partial matches

• Controversy between probabilistic and vector models– The vector model is expected to

outperform the probabilistic model with general collections

Page 35: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Alternative Set Theoretic Models

• Fuzzy Set Model• Extended Boolean Model

Page 36: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Fuzzy Theory

• A fuzzy subset A of a universe U is characterized by a membership function uA: U{0,1} which associates with each element uU a number uA

• Let A and B be two fuzzy subsets of U,

),min(

),max(

1

BABA

BABA

AA

Page 37: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Fuzzy Information Retrieval

• Using a term-term correlation matrix

• Define a fuzzy set associated to each index term ki

– If a term kl is strongly related to ki, that is ci,l ~1, then ui(dj)~1

– If a term kl is loosely related to ki, that is ci,l ~0, then ui(dj)~0

vuvu

vuvu dfdfdf

dfc

,

,,

ji dk

liji cd )1(1)( ,

Page 38: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Example

• Disjunctive Normal Form

)( cbadnf kkkq

)()()( cbacbacbadnf kkkkkkkkkq

)1)(1()(

)1()(

)()()()(

,,,,,

,,,,,

,,,,,

jcjbjajcba

jcjbjajcba

jcjbjajcjbjajcba

uuudu

uuudu

uuududududu

))(1())(1())(1(1

)1(1)(

,,,,,,

3

1111

jcbajcbajcba

cci

ccccccjq

ddd

di

cc1cc3

cc2Ka Kb

Kc

Page 39: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Algebraic Sum and Product

• The degree of membership in a disjunctive fuzzy set is computed using an algebraic sum, instead of max function

• The degree of membership in a conjunctive fuzzy set is computed using an algebraic product, instead of min function

• More smooth than max and min functions

Page 40: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Alternative Algebraic Models

• Generalized Vector Space Model• Latent Semantic Model•Neural Network Model

Page 41: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Sparse Matrix Problem

• Considering a term-doc matrix of dimensions 1M*1M– Most of the entries will be 0 sparse matrix– A waste of storage and computation– How to reduce the dimensions?

Page 42: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Latent Semantic Indexing (1/5)

• Let M=(Mij) be a term-document association matrix with t rows and N columns

• Latent semantic indexing decomposes M using Singular Value Decompositions

– K is the matrix of eigenvectors derived from the term-to-term correlation matrix (MMt)

– Dt is the matrix of eigenvectors derived from the transpose of the document-to-document matrix (MtM)

– S is an rr diagonal matrix of singular values, where r=min(t,N) is the rank of M

tKSDM

Page 43: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Latent Semantic Indexing (2/5)

• Consider now only the s largest singular values of S, and their corresponding columns in K and Dt

– (The remaining singular values of S are deleted)

• The resultant matrix Ms (rank s) is closest to the original matrix M in the least square sense

• s<r is the dimensionality of a reduced concept space

tssss DSKM

Page 44: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Latent Semantic Indexing (3/5)

• The selection of s attempts to balance two opposing effects– s should be large enough to allow fitting

all the structure in the real data– s should be small enough to allow

filtering out all the non-relevant representational details

Page 45: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Latent Semantic Indexing (4/5)

• Consider the relationship between any two documents

tssss

tssss

tsss

tsss

tsss

ttsss

t

SDSD

DSSD

DSKKSD

DSKDSKMMss

))((

)(

Page 46: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Latent Semantic Indexing (5/5)

• To rank documents with regard to a given user query, we model the query as a pseudo-document in the original matrix M– Assume the query is modeled as the docum

ent with number k – Then the kth row in the matrix provides

the ranks of all documents with respect to this query

ssMM t

Page 47: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Computing an Example• Let (Mij) be given by the matrix

– Compute the matrices (K), (S), and (D)t

k1 k2 k3 q djd1 2 0 1 5d2 1 0 0 1d3 0 1 3 11d4 2 0 0 2d5 1 2 4 17d6 1 2 0 5d7 0 5 0 10

q 1 2 3

Page 48: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

• Latent Semantic Indexing transforms the occurrence matrix into a relation between the terms and concepts, and a relation between the concepts and the documents– Indirect relation between terms and

documents through some hidden (or latent) conceptsTaipei

Taiwan

…doc

?

Page 49: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Taipei

Taiwan

…doc

(Latent)Concepts

Page 50: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Alternative Probabilistic Model

• Bayesian Networks• Inference Network Model

• Belief Network Model

Page 51: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Bayesian Network

• Let xi be a node in a Bayesian network G and xi

be the set of parent nodes of xi

• The influence of xi on xi can be specified by

any set of functions that satisfy:

• P(x1,x2,x3,x4,x5)=P(x1)P(x2|x1)P(x3|x1)P(x4|x2,x3)P(x5|x3)

1),(0

1),(

i

i

i

xii

xxii

xF

xF

Page 52: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Belief Network Model (1/6)

• The probability spaceThe set K={k1, k2, …, kt} of all index terms is the universe. To each subset u is associated a vector such that gi( )=1 kiu

• Random variables– To each index term ki is associated a binary

random variable

k

k

Page 53: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Belief Network Model (2/6)

• Concept space– A document dj is represented as a concept c

omposed of the terms used to index dj– A user query q is also represented as a conc

ept composed of the terms used to index q– Both user query and document are modeled

as subsets of index terms• Probability distribution P over K

t

u

uP

uPucPcP

)2

1()(

)()|()(

Degree of coverage of K by c

Page 54: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Belief Network Model (3/6)

• A query q is modeled as a network node– This random variable is set to 1 whenever q comple

tely covers the concept space K– P(q) computes the degree of coverage of the space

K by q• A document dj is modeled as a network node

– This random variable is 1 to indicate that dj completely covers the concept space K

– P(dj) computes the degree of coverage of the space K by dj

Page 55: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Belief Network Model (4/6)

Page 56: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Belief Network Model (5/6)

• Assumption – P(dj |q) is adopted as the rank of the docum

ent dj with respect to the query q

kj

uj

ujj

jj

kPkqPkdP

uPuqPudP

uPuqdPqdP

qPqdPqdP

)()|()|(

)()|()|(

)()|()(

)(/)()|(

Page 57: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Belief Network Model (6/6)

• Specify the conditional probabilities as follows

• Thus, the belief network model can be tuned to subsume the vector model

otherwise

qgkkifkqP

otherwise

dgkkifkdP

iiw

w

jiiw

w

j

ti qi

qi

ti ji

ji

1)(

0)|(

1)(

0)|(

12,

,

12,

,

Page 58: IR Models J. H. Wang Mar. 11, 2008. The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y

Comparison

• Belief network model – Is based on set-theoretic view– It provides a separation between the

document and the query – It is able to reproduce any ranking strategy

generated by the inference network model

• Inference network model– Takes a purely epistemological view which

is more difficult to grasp