ghislain fourny information retrieval - systems group...information retrieval 9. probabilistic...

223
Ghislain Fourny Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo

Upload: others

Post on 08-May-2020

25 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

Ghislain Fourny

Information Retrieval9. Probabilistic Information Retrieval

Picture copyright: johan2011/123RF Stock Photo

Page 2: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

2

What we have seen so far

Page 3: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

3

Boolean retrieval

lawyer ANDPenang AND NOT silver

InputSet of documents

OutputSubset of documents

query

Page 4: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

4

Ranked retrieval

lawyerPenangsilver

2

1

3

InputSet of documents

OutputRanked subset of documents

query

4

Page 5: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

5

Probability theory

Page 6: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

6

Universe

Page 7: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

7

Elementary event (outcome)

! 2 ⌦

Page 8: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

8

Probability distribution

0.3

0.2

0.1

0.4

0.1

p :⌦ ! [0, 1]! 7! p(!)

X

!2⌦

p(!) = 1

Page 9: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

9

Event

0.3

0.2

0.1

0.4

0.1p(E) =

X

!2E

p(!) 2 [0, 1]

E ⇢ ⌦

Page 10: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

10

Complement

0.3

0.2

0.1

0.4

0.1

E ⇢ ⌦

p(E) = 1� p(E)

⌦E ⇢ ⌦

Page 11: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

11

E ⇢ ⌦

Odds

0.3

0.2

0.1

0.4

0.1

E ⇢ ⌦

Op(E) =p(E)

p(E)

Page 12: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

12

Intersection

0.3

0.2

0.1

0.4

0.1

E \ F

E

F

Page 13: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

13

Union

0.3

0.2

0.1

0.4

0.1

E

F

P (E [ F ) = P (E) + P (F )� P (E \ F )

E [ F

Page 14: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

14

Disjoint events

0.3

0.2

0.1

0.4

0.1

E

FE [ F

P (E [ F ) = P (E) + P (F )

P (E \ F ) = 0

Page 15: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

15

Partition rule

0.3

0.2

0.1

0.4

0.1p(E) = p(E \ F ) + p(E \ F )

E

F

Page 16: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

16

Conditional probability

0.3

0.2

0.1

0.4

0.1

E

F

p(E|F ) =p(E \ F )

p(F )

Page 17: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

17

Conditional probability

0.3

0.2

0.1

0.4

0.1

E

F

p(E|F ) =p(E \ F )

p(F )

Page 18: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

18

Chain rule

0.3

0.2

0.1

0.4

0.1

E

F

p(E \ F ) = p(E|F )⇥ p(F )

Page 19: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

19

Independence

0.3

0.2

0.10.1

E

F

p(E \ F ) = p(E)⇥ p(F )

P (E|F ) = P (E)

Page 20: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

20

Puzzle

When it rains, I forget my umbrella 5% of the time.

It rains every other day.

Last year, I had my umbrella on 271 days.

Page 21: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

21

Puzzle

When it rains, I forget my umbrella 5% of the time.

It rains every other day.

Last year, I had my umbrella on 271 days.

Today I have my umbrella. Is it raining?

Page 22: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

22

Visual representation

Page 23: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

23

Visual representation

Page 24: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

24

Visual representation

Page 25: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

25

Probability that I don't forget my umbrella if it rains

Page 26: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

26

Likelihood that it is raining if I took my umbrella

Page 27: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

27

Visual representation

+

Page 28: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

28

P( and )Bayes Formula – Using Chain Rule...

Page 29: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

29

P( and )= P( | ).P( )

Bayes Formula – Using Chain Rule...

Page 30: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

30

P( and )= P( | ).P( ) = P( | ).P( )

Bayes Formula – Using Chain Rule...

Page 31: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

31

P( | )

P( | ).P( ) P( )

Bayes Formula

=

Page 32: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

32

Visual representation

Christos Georgiou / 123RF Stock Photo

Page 33: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

33

P( | )

0.95 .P( ) P( )

Bayes Formula

=

Page 34: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

34

P( | )

0.95 . 0.5P( )

Bayes Formula

=

Page 35: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

35

P( | )

0.95 . 0.5271/365

Bayes Formula

=

Page 36: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

36

P( | )

64%

Bayes Formula

=

Page 37: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

37

Visual representation (at scale)

Page 38: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

38

Visual representation (at scale)

I had my umbrella 271 of 365 days.

Page 39: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

39

Visual representation (at scale)

I only forget my umbrella 5% of the rainy days.

Page 40: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

40

Visual representation (at scale)

It rains 50% of the time

Page 41: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

41

Visual representation (at scale)

But it rains 64% of the timeon the days I have my umbrella.

Page 42: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

42

The magics

x P( )

prior

Page 43: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

43

Prior

x P( )

prior

Page 44: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

44

Adding new information

P( | ) =P( | )

x P( )P( )

Page 45: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

45

Posterior

P( | ) =

posterior

Page 46: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

46

The magics

P( | ) =P( | )

x P( )P( )

posterior prior

Page 47: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

47

Bayes' rule

0.3

0.2

0.1

0.4

0.1

E

p(E|F ) =P (F |E)⇥ P (E)

P (F )

F

Page 48: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

48

Bayes' rule

p(E|F ) =P (F |E)

P (F )⇥ P (E)

posterior prior

0.3

0.2

0.1

0.4

0.1

E

F

Page 49: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

49

Random variable

0.3

0.2

0.1

0.4

0.1

X :⌦ ! S! 7! X(!)

Page 50: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

50

Random variable

0.3

0.2

0.1

0.4

0.1

X :⌦ ! S! 7! X(!)

S

Page 51: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

51

Random variable

0.3

0.2

0.1

0.4

0.1

S

pX :S ! [0, 1]

x 7!X

!|X(!)=x

p(!)

Page 52: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

52

Random variable

0.3

0.2

0.1

0.4

0.1

S

pX( ) = 0.5

pX( ) = 0.1

pX( ) = 0.5

Page 53: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

53

In practice...

SpX( ) = 0.5

pX( ) = 0.1

pX( ) = 0.5

Page 54: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

54

Alternate notation

S

P (X = ) = 0.1

P (X = ) = 0.5

P (X = ) = 0.5

Page 55: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

55

Don't do that!

P ( ) = 0.5

No go!

No go!

No go!

No go!

Page 56: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

56

Joint probabilities

0.3

0.2

0.1

0.4

0.1

pXY (x, y) =X

!|X(!)=x^Y (!)=y

p(!)

Page 57: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

57

Conditional probabilities

0.3

0.2

0.1

0.4

0.1

pX|Y (x, y) =pXY (x, y)

pY (y)

Page 58: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

58

Conditional probabilities

0.3

0.2

0.1

0.4

0.1

P (X = x|Y = y) =P (X = x ^ Y = y)

P (Y = y)

Page 59: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

59

P (x|y) = P (x, y)

P (y)

Don't do that!

No go!

No go!

No go!

No go!

Page 60: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

60

Probability model for document retrieval

Page 61: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

61

Query as a random variable

! 2 ⌦ q

Q

Page 62: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

62

Document as a random variable

d

D! 2 ⌦

Page 63: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

63

Relevance as a random variable

r=0 or 1

R! 2 ⌦

B

Page 64: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

64

Probability ranking principle

Probability that, for a query q and a document d,d is relevant for query q

P (R = 1|D = d ^Q = q)

Page 65: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

65

Probability ranking principle

Probability that, for a query q and a document d,d is relevant for query q

P (R = 1|D = d ^Q = q)

D=d

Q=qR=1

R=0

Page 66: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

66

Ideal world

P (R = 1|D = d ^Q = q)

P (R = 1|D = e ^Q = q)

P (R = 1|D = f ^Q = q)

P (R = 1|D = g ^Q = q)

Page 67: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

67

Probability ranking principle

P (R = 1|D = d ^Q = q)

P (R = 1|D = e ^Q = q)

P (R = 1|D = f ^Q = q)

P (R = 1|D = g ^Q = q)

Page 68: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

68

Probability ranking principle

SortP (R = 1|D = d ^Q = q)

P (R = 1|D = e ^Q = q)

P (R = 1|D = f ^Q = q)

P (R = 1|D = g ^Q = q)

Page 69: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

69

"Boolean retrieval"

P (R = 1|D = d ^Q = q) > P (R = 0|D = d ^Q = q)

Page 70: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

70

"Boolean retrieval"

D=d

Q=qR=1

R=0D=d

Q=qR=1

R=0

Page 71: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

71

"Boolean retrieval"

P (R = 1|D = d ^Q = q) >1

2

Page 72: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

72

Retrieval costs

Sort by increasing

C0 ⇥ P (R = 0|D = d ^Q = q)� C1 ⇥ P (R = 1|D = d ^Q = q)

Page 73: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

73

Retrieval costs

Sort by increasing

Return Not return

Relevant0 -C1

Not relevant-C0 0

diff

C0 ⇥ P (R = 0|D = d ^Q = q)� C1 ⇥ P (R = 1|D = d ^Q = q)

P (R = 1|D = d ^Q = q)

P (R = 0|D = d ^Q = q)

Page 74: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

74

Retrieval costs

Sort by increasing

Cost of returning an irrelevant document

C0 ⇥ P (R = 0|D = d ^Q = q)� C1 ⇥ P (R = 1|D = d ^Q = q)

Page 75: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

75

Retrieval costs

Sort by increasing

Cost of not returning an relevant document

C0 ⇥ P (R = 0|D = d ^Q = q)� C1 ⇥ P (R = 1|D = d ^Q = q)

Page 76: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

76

With identical costs...

Sort by increasing

C(1�⇥P (R = 1|D = d ^Q = q))� C ⇥ P (R = 1|D = d ^Q = q)

Page 77: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

77

With identical costs...

Sort by increasing

1� 2⇥ P (R = 1|D = d ^Q = q)

Page 78: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

78

With identical costs...

Sort by decreasing

We "fall back" to the previous method

P (R = 1|D = d ^Q = q)

Page 79: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

79

Binary independence model

Page 80: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

80

Model and abstraction

Document as a list of words(with duplicates)

Simplification

Document as a set of words

Document as a vector of booleans

(0 1 0 1 0 1 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0)

Page 81: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

81

Documents and queries as vectors

d =

0

BB@

d1d2d3...

1

CCA q =

0

BB@

q1q2q3...

1

CCA

Page 82: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

82

One "document" random variable per term

dETH

DETH

! 2 ⌦

B

Page 83: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

83

One "query" random variable per term

qinformation

Qinformation

! 2 ⌦

B

Page 84: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

84

So we can ow write things like...

P (Dk = dk|R = 1 ^Q = q)

Page 85: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

85

So we can ow write things like...

P (Dk = dk|R = 1 ^Q = q)

Probability that the documents contains term k, knowingthat the query is q and that it is relevant.

Page 86: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

86

Naive Bayes Assumption

Term occur "independently" in documents

P (D = d) =k=MY

k=1

P (Dk = dk)

Page 87: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

87

Naive Bayes Assumption

P (D = d) =k=MY

k=1

P (Dk = dk)

Term occur "independently" in documents

Page 88: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

88

Going back to what we want to rank...

SortP (R = 1|D = d ^Q = q)

P (R = 1|D = e ^Q = q)

P (R = 1|D = f ^Q = q)

P (R = 1|D = g ^Q = q)

Page 89: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

89

Bayes' formula

P (R = 1|D = d) =P (D = d|R = 1)⇥ P (R = 1)

P (D = d)

Page 90: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

90

Condition on a query q...

P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|Q = q)

Page 91: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

91

Condition on a query q...

That's a lot to evaluate!

P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|Q = q)

Page 92: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

92

Condition on a query q...

Can we get rid of some of it?

P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|Q = q)

Page 93: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

93

Condition on a query q...

P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|Q = q)

Page 94: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

94

Condition on a query q...

P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|Q = q)

P (R = 0|D = d ^Q = q) =P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)

P (D = d|Q = q)

Page 95: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

95

Condition on a query q...

P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|Q = q)

P (R = 0|D = d ^Q = q) =P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)

P (D = d|Q = q)

O(R = 1|D = d ^Q = q) =P (R = 1|D = d ^Q = q)

P (R = 0|D = d ^Q = q)

Page 96: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

96

Condition on a query q...

P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|Q = q)

P (R = 0|D = d ^Q = q) =P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)

P (D = d|Q = q)

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)

Page 97: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

97

Condition on a query q...

P (R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|Q = q)

P (R = 0|D = d ^Q = q) =P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)

P (D = d|Q = q)

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)

Page 98: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

98

Condition on a query q...

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)⇥ P (R = 1|Q = q)

P (D = d|R = 0 ^Q = q)⇥ P (R = 0|Q = q)

Page 99: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

99

Condition on a query q...

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)

P (D = d|R = 0 ^Q = q)⇥ P (R = 1|Q = q)

P (R = 0|Q = q)

Page 100: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

100

Condition on a query q...

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)

P (D = d|R = 0 ^Q = q)⇥ P (R = 1|Q = q)

P (R = 0|Q = q)

These are odds!

Page 101: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

101

Condition on a query q...

These are odds!

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)

P (D = d|R = 0 ^Q = q)⇥O(R = 1|Q = q)

Page 102: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

102

Condition on a query q...

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)

P (D = d|R = 0 ^Q = q)⇥O(R = 1|Q = q)

Page 103: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

103

Condition on a query q...

We can use ourindependence

assumptionhere

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)

P (D = d|R = 0 ^Q = q)⇥O(R = 1|Q = q)

Page 104: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

104

Naive Bayes Assumption

P (D = d) =k=MY

k=1

P (Dk = dk)

Term occur "independently" in documents

Page 105: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

105

Naive Bayes Assumption

Term occur "independently" in documentseven conditioned on relevance and a given query

P (D = d|R = 1 ^Q = q) =k=MY

k=1

P (Dk = dk|R = 1 ^Q = q)

Page 106: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

106

Condition on a query q...

We can use ourindependence

assumptionhere

O(R = 1|D = d ^Q = q) =P (D = d|R = 1 ^Q = q)

P (D = d|R = 0 ^Q = q)⇥O(R = 1|Q = q)

Page 107: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

107

Condition on a query q...

O(R = 1|D = d ^Q = q) =

Qk=Mk=1 P (Dk = dk|R = 1 ^Q = q)

Qk=Mk=1 P (Dk = dk|R = 0 ^Q = q)

⇥O(R = 1|Q = q)

Page 108: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

108

Condition on a query q...

O(R = 1|D = d ^Q = q) =k=MY

k=1

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)

Page 109: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

109

Condition on a query q...

O(R = 1|D = d ^Q = q) =k=MY

k=1

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)

This is boolean!

Page 110: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

110

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=0

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥

Y

k|dk=1

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)

Page 111: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

111

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=0

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥

Y

k|dk=1

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)

We call this pk

Page 112: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

112

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=0

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥

Y

k|dk=1

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)

We call this pk

We call this uk

Page 113: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

113

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=0

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥

Y

k|dk=1

P (Dk = dk|R = 1 ^Q = q)

P (Dk = dk|R = 0 ^Q = q)⇥O(R = 1|Q = q)

We call this pk

We call this uk

kth term present kth term absent

Relevant pk 1-pk

Not relevant uk 1-uk

Page 114: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

114

Condition on a query q...

kth term present kth term absent

Relevant pk 1-pk

Not relevant uk 1-uk

O(R = 1|D = d ^Q = q) =Y

k|dk=1

pk

uk⇥

Y

k|dk=0

1� pk

1� uk⇥O(R = 1|Q = q)

Page 115: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

115

O(R = 1|D = d ^Q = q) =Y

k|dk=1

pk

uk⇥

Y

k|dk=0

1� pk

1� uk⇥O(R = 1|Q = q)

Condition on a query q...

We can limit the product to terms in qif...

Page 116: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

116

O(R = 1|D = d ^Q = q) =Y

k|dk=1

pk

uk⇥

Y

k|dk=0

1� pk

1� uk⇥O(R = 1|Q = q)

Condition on a query q...

We can limit the product to terms in qif...

8k 2 [1,M ], qk = 0 =) (pk = uk)

Page 117: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

117

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk

uk⇥

Y

k|dk=0^qk=1

1� pk

1� uk⇥O(R = 1|Q = q)

Page 118: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

118

Condition on a query q...

Terms found in d

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk

uk⇥

Y

k|dk=0^qk=1

1� pk

1� uk⇥O(R = 1|Q = q)

Page 119: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

119

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk

uk⇥

Y

k|dk=0^qk=1

1� pk

1� uk⇥O(R = 1|Q = q)

Y

k|dk=1^qk=1

1� pk1� uk

Y

k|dk=1^qk=1

1� pk1� uk

divide with multiply with

Page 120: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

120

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk ⇥ (1� uk)

uk ⇥ (1� pk)⇥

Y

k|qk=1

1� pk

1� uk⇥O(R = 1|Q = q)

Page 121: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

121

Condition on a query q...

This does not depend on d!

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk ⇥ (1� uk)

uk ⇥ (1� pk)⇥

Y

k|qk=1

1� pk

1� uk⇥O(R = 1|Q = q)

Page 122: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

122

Condition on a query q...

This does not depend on d!

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk ⇥ (1� uk)

uk ⇥ (1� pk)⇥

Y

k|qk=1

1� pk

1� uk⇥O(R = 1|Q = q)

Page 123: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

123

Condition on a query q...

This does not depend on d!

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk ⇥ (1� uk)

uk ⇥ (1� pk)⇥

Y

k|qk=1

1� pk

1� uk⇥O(R = 1|Q = q)

This is what we care about!

Page 124: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

124

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk ⇥ (1� uk)

uk ⇥ (1� pk)⇥

Y

k|qk=1

1� pk

1� uk⇥O(R = 1|Q = q)

Page 125: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

125

Condition on a query q...

O(R = 1|D = d ^Q = q) =Y

k|dk=1^qk=1

pk ⇥ (1� uk)

uk ⇥ (1� pk)⇥

Y

k|qk=1

1� pk

1� uk⇥O(R = 1|Q = q)log

Page 126: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

126

Odds ratio

RSVd = logY

k|dk=1^qk=1

pk

1�pk

uk1�uk

Page 127: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

127

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� uk

Page 128: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

128

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� ukOdds of containing term k

in relevant documents

Page 129: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

129

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� ukOdds of containing term k

in relevant documentsOdds of containing term kin non-relevant documents

Page 130: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

130

Retrieval Status Value

kth term present kth term absent

Relevant pk 1-pk

Not relevant uk 1-uk

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� uk

Page 131: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

131

RSV as a sum of weights

RSVd =X

k|dk=1^qk=1

ck

Page 132: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

132

RSV as a scalar product

RSVd = �!c .�!s(if we overwrite c with zeros outside of the query support)

Page 133: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

133

Retrieval Status Value

RSVd = �!c .�!s(if we overwrite c with zeros outside of the query support)

Weights!

Page 134: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

134

Reminder: Zone queries

3information 1.body 4.title, 4.body 5.body, 5.abstract

Score of a document:

~g.~s

Page 135: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

135

Estimating the weights (in theory)

Page 136: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

136

Contingency table

kth term present kth term absent

Relevant pk 1-pk

Not relevant uk 1-uk

Page 137: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

137

Contingency table

kth term present kth term absent Total

Relevant

Not relevant

Total N

Page 138: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

138

Contingency table

kth term present kth term absent Total

Relevant

Not relevant

Total dtf N

Page 139: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

139

Contingency table

kth term present kth term absent Total

Relevant S

Not relevant

Total dtf N

Page 140: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

140

Contingency table

kth term present kth term absent Total

Relevant S

Not relevant N-S

Total dtf N - dft N

Page 141: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

141

Contingency table

kth term present kth term absent Total

Relevant s S-s S

Not relevant dtf - s N-S

Total dtf N - dft N

Page 142: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

142

Contingency table

kth term present kth term absent Total

Relevant s S-s S

Not relevant dtf - s N - dft – S + s N-S

Total dtf N - dft N

Page 143: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

143

Contingency table

kth term present kth term absent Total

Relevant s S-s S

Not relevant dtf - s N - dft – S + s N-S

Total dtf N - dft N

Page 144: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

144

Contingency table

kth term present kth term absent Total

Relevant s S-s S

Not relevant dtf - s N - dft – S + s N-S

Total dtf N - dft N

Odds for pt

Odds for ut

Page 145: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

145

Contingency table

kth term present kth term absent Total

Relevant s S-s S

Not relevant dtf - s N - dft – S + s N-S

Total dtf N - dft N

c = logpk

1� pk+ log

1� uk

uk

Odds for pt

Odds for ut

Page 146: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

146

Contingency table

kth term present kth term absent Total

Relevant s S-s S

Not relevant dtf - s N - dft – S + s N-S

Total dtf N - dft N

ck = logs

S�sdft�s

N�dft�S+s

Page 147: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

147

With smoothing

kth term present kth term absent Total

Relevant s S-s S

Not relevant dtf - s N - dft – S + s N-S

Total dtf N - dft N

+1/2

+1/2

+1/2

+1/2

ck = log

s+ 12

S�s+ 12

dft�s+ 12

N�dft�S+s+ 12

Page 148: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

148

Estimating the weights (in practice)

Page 149: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

149

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� uk

Page 150: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

150

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� ukOdds of containing term k

in relevant documents

Page 151: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

151

Retrieval Status Value

Croft and Harper suggestthat the odds are 1.

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� uk

Page 152: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

152

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� uk

Odds of containing term kin non-relevant documents

Page 153: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

153

Retrieval Status Value

Statistics-based estimate

RSVd =X

k|dk=1^qk=1

logpk

1� pk� log

uk

1� uk

Odds of containing term kin non-relevant documentsdft

N � dft

Page 154: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

154

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

log 1� logdft

N � dft

Page 155: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

155

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

log 1 + logN � dft

dft

Page 156: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

156

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logN

dft

(approximation)

Page 157: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

157

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logN

dft

This is the inverted document frequency!

Page 158: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

158

Retrieval Status Value

RSVd =X

k|dk=1^qk=1

logN

dft

This justifies idf weighting in the Vector-Space Model!

Page 159: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

159

Relevance feedback

Page 160: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

160

Relevance feedback

Input query

Page 161: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

161

Relevance feedback

Input query

Execute query

Page 162: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

162

Relevance feedback

Input query

Display results

Execute query

Page 163: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

163

Relevance feedback

Input query

Display results

Mark relevant documents

Execute query

Page 164: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

164

Relevance feedback

Input query

Display results

Mark relevant documents

Execute query

Update (posteriors)

Page 165: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

165

Relevance feedback

Input query

Display results

Mark relevant documents

Execute query

Update (posteriors)

Page 166: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

166

Information need vs. query

ETH Zurich

Page 167: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

167

Information need vs. query

ETH Zurich

Alice is searching for a higher-education institution

Page 168: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

168

Information need vs. query

ETH Zurich

Alice is searching for a higher-education institution

Bob is searching for Ethereum cryptocurrency in Zurich, Kansas

Page 169: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

169

Information need vs. query

ETH Zurich

Alice is searching for a higher-education institution

Bob is searching for Ethereum cryptocurrency in Zurich, Kansas

Carlos is searching for the extended trading hours on insurance stocks.

Page 170: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

170

Language Models

Page 171: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

171

Finite State Automaton (FSA)

b a astart

ab b b

a,b

a

Page 172: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

172

Finite State Automaton (FSA) – Transition table

State a b1 5 22 3 53 4 54 4 55 5 5

b a astart

ab b b

a,b

a

Page 173: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

173

Finite State Automaton (FSA)

b a astart

ab b b

a,b baaaa

a

Page 174: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

174

Finite State Automaton (FSA)

b a astart

ab b b

a,b |baaaa

a

Page 175: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

175

Finite State Automaton (FSA)

b a astart

ab b b

a,b b|aaaa

a

Page 176: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

176

Finite State Automaton (FSA)

b a astart

ab b b

a,b ba|aaa

a

Page 177: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

177

Finite State Automaton (FSA)

b a astart

ab b b

a,b baa|aa

a

Page 178: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

178

Finite State Automaton (FSA)

b a astart

ab b b

a,b baaa|a

a

Page 179: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

179

Finite State Automaton (FSA)

b a astart

ab b b

a,b baaaa|

accept

a

Page 180: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

180

Finite State Automaton (FSA)

b a astart

ab b b

a,b baba

a

Page 181: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

181

Finite State Automaton (FSA)

b a astart

ab b b

a,b |baba

a

Page 182: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

182

Finite State Automaton (FSA)

b a astart

ab b b

a,b b|aba

a

Page 183: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

183

Finite State Automaton (FSA)

b a astart

ab b b

a,b ba|ba

a

Page 184: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

184

Finite State Automaton (FSA)

b a astart

ab b b

a,b bab|a

a

Page 185: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

185

Finite State Automaton (FSA)

b a astart

ab b b

a,b baba|reject

a

Page 186: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

186

Finite State Automaton - formally

q0 q1q2 q3b a a

start

q4a

b b b

a,b

F

S

Q

d(q1,a)=q2

a

Page 187: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

187

Language model

baabaaabaaaa...

Language space

L

Page 188: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

188

Language model

baabaaabaaaa...

Language space

Page 189: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

189

Language model

baabaaabaaaa...

Language space

p :⌦ ! [0, 1]! 7! p(!)

X

!2⌦

p(!) = 1

Page 190: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

190

Language model

baabaaabaaaa...

Language space

X

s2Lp(s) = 1

p :L ! [0, 1]s 7! p(s)

L

Page 191: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

191

Finite State Automaton (FSA)

b a a

astart

ab b b

a,b

How do we turn thisinto a generator?

Page 192: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

192

Finite State Automaton (FSA)

b a a

a

ab b b

a,b

1

2

1

2

1

2

1

2

1

2

1

2

1

2

1

4

1

4

stop

start

Page 193: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

193

Finite State Automaton (FSA)

b a a

a

ab b b

a,b

1

2

1

2

1

2

1

2

1

2

1

2

1

2

1

4

1

4

stop

start

Page 194: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

194

Finite State Automaton (FSA)

b a a

a

ab b b

a,b

1

2

1

2

1

2

1

2

1

2

1

2

1

2

1

4

1

4

stop

start

Page 195: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

195

Finite State Automaton (FSA)

b a a

a

ab b b

a,b

1

2

1

2

1

2

1

2

1

2

1

2

1

2

b

1

4

1

4

stop

start

Page 196: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

196

Finite State Automaton (FSA)

b a a

a

ab b b

a,b

1

2

1

2

1

2

1

2

1

2

1

2

1

2

ba

1

4

1

4

stop

start

Page 197: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

197

Finite State Automaton (FSA)

b a a

a

ab b b

a,b

1

2

1

2

1

2

1

2

1

2

1

2

1

2

baa

1

4

1

4

stop

start

Page 198: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

198

Finite State Automaton (FSA)

b a a

a

ab b b

a,b

1

2

1

2

1

2

1

2

1

2

1

2

1

2

baaa

1

4

1

4

stop

start

Page 199: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

199

Finite State Automaton (FSA)

b a a

a

ab b b

a,b

1

2

1

2

1

2

1

2

1

2

1

2

1

2

baaa

1

4

1

4

stop

start

1

64

Page 200: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

200

Generating a document at random

Random documentL 2 ⌦ ! T ⇤

Page 201: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

201

Generating a document at random

Random document

List of words model!

L 2 ⌦ ! T ⇤

Page 202: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

202

Generating a document at random

Random document

Random length

L 2 ⌦ ! T ⇤

kLk 2 ⌦ ! N

Page 203: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

203

Chain rule (generating a document at random)

P (L = (t1, t2, t3)) =?

Page 204: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

204

Chain rule (generating a document at random)

P (L = (t1, t2, t3)) = P (L1 = t1).P (L2 = t2|L1 = t1)

.P (L3 = t3|L2 = t2 ^ L1 = t1).P (kLk = 3|L3 = t3 ^ L2 = t2 ^ L1 = t1)

Page 205: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

205

Unigram language model(Term independence)

P (L = (t1, t2, t3)) = P (L1 = t1).P (L2 = t2).P (L3 = t3).pstop

Page 206: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

206

Corresponding automaton (unigram model)

ETH 0.1Zürich 0.3

information 0.5retrieval 0.1

Page 207: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

207

Corresponding automaton (unigram model)

ETH 0.1Zürich 0.3

information 0.5retrieval 0.1

We can build such alanguage model

from any document!

Use term frequencies!

Page 208: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

208

How do we get rid of the order?

Random document(list of word model)

L 2 ⌦ ! T ⇤

Page 209: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

209

How do we get rid of the order?

Random document(list of word model)

L 2 ⌦ ! T ⇤

Random document(bag of words model)

Page 210: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

210

How do we get rid of the order?

Random document(list of word model)

L 2 ⌦ ! T ⇤

Random document(bag of words model)

D 2 ⌦ ! NW

Page 211: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

211

How do we get rid of the order?

Random document(list of word model)

L 2 ⌦ ! T ⇤

Random document(bag of words model)

D 2 ⌦ ! NW

P (D = d) =X

l matching bag d

P (L = l)

Page 212: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

212

How do we get rid of the order?

Random document(list of word model)

L 2 ⌦ ! T ⇤

Random document(bag of words model)

D 2 ⌦ ! NW

P (D = d) =X

l matching bag d

P (L = l)

(This is actually a multinomial distribution: see combinatorics)

Page 213: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

213

We generate a model for every document

Page 214: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

214

Now, let's get back to information retrieval

Enters a query q

Page 215: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

215

Now, let's get back to information retrieval

Enters a query q

Thought experiment: imagine that:• we picked a random document and built its model

Page 216: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

216

Now, let's get back to information retrieval

Enters a query q

Thought experiment: imagine that:• we picked a random document and built its model• we used this model to generate a new document

Page 217: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

217

Now, let's get back to information retrieval

Enters a query q

Thought experiment: imagine that:• we picked a random document and built its model• we used this model to generate a new document• that document turns out to be q

Page 218: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

218

Now, let's get back to information retrieval

Enters a query q

Thought experiment: imagine that:• we picked a random document and built its model• we used this model to generate a new document• that document turns out to be q

What document is the most likely to have been picked and to have generated q?

Page 219: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

219

Bayesian model

P (D = d|Q = q) =P (Q = q|D = d).P (D = d)

P (Q = q)

Page 220: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

220

Bayesian model

P (D = d|Q = q) =P (Q = q|D = d).P (D = d)

P (Q = q)

We need to sort by this

Page 221: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

221

Bayesian model

P (D = d|Q = q) =P (Q = q|D = d).P (D = d)

P (Q = q)We can ignore this (constant)

Page 222: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

222

Bayesian model

P (D = d|Q = q) =P (Q = q|D = d).P (D = d)

P (Q = q)

We can also ignore this (uniform)

Page 223: Ghislain Fourny Information Retrieval - Systems Group...Information Retrieval 9. Probabilistic Information Retrieval Picture copyright: johan2011/123RF Stock Photo 2 What we have seen

223

Bayesian model

P (D = d|Q = q) =P (Q = q|D = d).P (D = d)

P (Q = q)

This is just the probability of qunder the model built from d!