jaime arguello inls 509: information retrieval1 evaluation metrics jaime arguello inls 509:...

87
1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval [email protected] March 25, 2013 Monday, March 25, 13

Upload: others

Post on 16-Jul-2020

12 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

1

Evaluation Metrics

Jaime ArguelloINLS 509: Information Retrieval

[email protected]

March 25, 2013

Monday, March 25, 13

Page 2: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

2

• At this point, we have a set of queries, with identified relevant and non-relevant documents

• The goal of an evaluation metric is to measure the quality of a particular ranking of known relevant/non-relevant documents

Batch Evaluationevaluation metrics

Monday, March 25, 13

Page 3: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

3

Set Retrievalprecision and recall

• So far, we’ve defined precision and recall assuming boolean retrieval: a set of relevant documents (REL) and a set of retrieved documents (RET)

collection

REL RET

Monday, March 25, 13

Page 4: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

4

Set Retrievalprecision and recall

• Precision (P): the proportion of retrieved documents that are relevant

collection

REL RET

P =|RET \ REL|

|RET|

Monday, March 25, 13

Page 5: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

5

Set Retrievalprecision and recall

• Recall (R): the proportion of relevant documents that are retrieved

collection

REL RET

R =|RET \ REL|

|REL|

Monday, March 25, 13

Page 6: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

6

Set Retrievalprecision and recall

• Recall measures the system’s ability to find all the relevant documents

• Precision measures the system’s ability to reject any non-relevant documents in the retrieved set

Monday, March 25, 13

Page 7: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

7

Set Retrievalprecision and recall

• A system can make two types of errors:

• a false positive error: the system retrieves a document that is non-relevant (should not have been retrieved)

• a false negative error: the system fails to retrieve a document that is relevant (should have been retrieved)

• How do these types of errors affect precision and recall?

Monday, March 25, 13

Page 8: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

8

Set Retrievalprecision and recall

• A system can make two types of errors:

• a false positive error: the system retrieves a document that is non-relevant (should not have been retrieved)

• a false negative error: the system fails to retrieve a document that is relevant (should have been retrieved)

• How do these types of errors affect precision and recall?

• Precision is affected by the number of false positive errors

• Recall is affected by the number of false negative errors

Monday, March 25, 13

Page 9: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

9

Set Retrievalcombining precision and recall

• Oftentimes, we want a system that has high-precision and high-recall

• We want a metric that measures the balance between precision and recall

• One possibility would be to use the arithmetic mean:

arithmetic mean(P ,R) =P + R

2

• What is problematic with this way of summarizing precision and recall?

Monday, March 25, 13

Page 10: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

10

Set Retrievalcombining precision and recall

• It’s easy for a system to “game” the arithmetic mean of precision and recall

• Bad: a system that obtains 1.0 precision and near 0.0 recall would get a mean value of about 0.50

• Bad: a system that obtains 1.0 recall and near 0.0 precision would get a mean value of about 0.50

• Better: a system that obtains 0.50 precision and near 0.50 recall would get a mean value of about 0.50

Monday, March 25, 13

Page 11: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

11

Set RetrievalF-measure (also known as F1)

• A system that retrieves a single relevant document would get 1.0 precision and near 0.0 recall

• A system that retrieves the entire collection would get 1.0 recall and near 0.0 precision

• Solution: use the harmonic mean rather than the arithmetic mean

• F-measure:

F =1

1

2

!

1

P+

1

R

" =2 ! P !R

P + R

Monday, March 25, 13

Page 12: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

12

Set RetrievalF-measure (also known as F1)

• The harmonic mean punishes small values

!"#$"%&'

('

)'*+,-./+'0+/-.-'1/2345+67'*+,8'

9/+72-2:8',8;'<+7,=='

<+3/2+0+;' >:3'/+3/2+0+;'

(slide courtesy of Ben Carterette)

arith

met

ic m

ean

of P

& R

Monday, March 25, 13

Page 13: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

13

Ranked Retrievalprecision and recall

• In most situations, the system outputs a ranked list of documents rather than an unordered set

• User-behavior assumption:

‣ The user examines the output ranking from top-to-bottom until he/she is satisfied or gives up

• Precision and recall can also be used to evaluate a ranking

• Precision/Recall @ rank K

Monday, March 25, 13

Page 14: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

14

...

Ranked Retrievalprecision and recall

• Precision: proportion of retrieved documents that are relevant

• Recall: proportion of relevant documents that are retrieved

Monday, March 25, 13

Page 15: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

15

...

Ranked Retrievalprecision and recall

• P@K: proportion of retrieved top-K documents that are relevant

• R@K: proportion of relevant documents that are retrieved in the top-K

• Assumption: the user will only examine the top-K results

K = 10

Monday, March 25, 13

Page 16: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

16

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 1

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052345678910

Monday, March 25, 13

Page 17: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

17

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 2

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.05345678910

Monday, March 25, 13

Page 18: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

18

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 3

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.053 (2/3) = 0.67 (2/20) = 0.1045678910

Monday, March 25, 13

Page 19: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

19

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 4

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.053 (2/3) = 0.67 (2/20) = 0.104 (3/4) = 0.75 (3/20) = 0.155678910

Monday, March 25, 13

Page 20: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

20

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 5

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.053 (2/3) = 0.67 (2/20) = 0.104 (3/4) = 0.75 (3/20) = 0.155 (4/5) = 0.80 (4/20) = 0.20678910

Monday, March 25, 13

Page 21: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

21

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 6

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.053 (2/3) = 0.67 (2/20) = 0.104 (3/4) = 0.75 (3/20) = 0.155 (4/5) = 0.80 (4/20) = 0.206 (5/6) = 0.83 (5/20) = 0.2578910

Monday, March 25, 13

Page 22: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

22

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 7

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.053 (2/3) = 0.67 (2/20) = 0.104 (3/4) = 0.75 (3/20) = 0.155 (4/5) = 0.80 (4/20) = 0.206 (5/6) = 0.83 (5/20) = 0.257 (6/7) = 0.86 (6/20) = 0.308910

Monday, March 25, 13

Page 23: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

23

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 8

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.053 (2/3) = 0.67 (2/20) = 0.104 (3/4) = 0.75 (3/20) = 0.155 (4/5) = 0.80 (4/20) = 0.206 (5/6) = 0.83 (5/20) = 0.257 (6/7) = 0.86 (6/20) = 0.308 (6/8) = 0.75 (6/20) = 0.30910

Monday, March 25, 13

Page 24: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

24

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 9

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.053 (2/3) = 0.67 (2/20) = 0.104 (3/4) = 0.75 (3/20) = 0.155 (4/5) = 0.80 (4/20) = 0.206 (5/6) = 0.83 (5/20) = 0.257 (6/7) = 0.86 (6/20) = 0.308 (6/8) = 0.75 (6/20) = 0.309 (7/9) = 0.78 (7/20) = 0.3510

Monday, March 25, 13

Page 25: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

25

Ranked Retrievalprecision and recall: exercise

• Assume 20 relevant documents K = 10

K P@K R@K1 (1/1) = 1.0 (1/20) = 0.052 (1/2) = 0.5 (1/20) = 0.053 (2/3) = 0.67 (2/20) = 0.104 (3/4) = 0.75 (3/20) = 0.155 (4/5) = 0.80 (4/20) = 0.206 (5/6) = 0.83 (5/20) = 0.257 (6/7) = 0.86 (6/20) = 0.308 (6/8) = 0.75 (6/20) = 0.309 (7/9) = 0.78 (7/20) = 0.3510 (7/10) = 0.70 (7/20) = 0.35

Monday, March 25, 13

Page 26: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

26

Ranked Retrievalprecision and recall

• Problem: what value of K should we use to evaluate?

• Which is better in terms of P@10 and R@10?

K = 10 K = 10

Monday, March 25, 13

Page 27: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

27

Ranked Retrievalprecision and recall

• The ranking of documents within the top K is inconsequential

• If we don’t know what value of K to chose, we can compute and report several: P/R@{1,5,10,20}

• There are evaluation metrics that do not require choosing K (as we will see)

• One advantage of P/R@K, however, is that they are easy to interpret

Monday, March 25, 13

Page 28: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

28

Ranked Retrievalwhat do these statements mean?

• As with most metrics, experimenters report average values (averaged across evaluation queries)

• System A obtains an average P@10 of 0.50

• System A obtains an average P@10 of 0.10

• System A obtains an average P@1 of 0.50

• System A obtains an average P@20 of 0.20

Monday, March 25, 13

Page 29: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

29

Ranked Retrievalcomparing systems

• Good practice: always ask yourself “Are users likely to notice?”

• System A obtains an average P@1 of 0.10

• System B obtains an average P@1 of 0.20

• This is a 100% improvement.

• Are user’s likely to notice?

Monday, March 25, 13

Page 30: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

30

Ranked Retrievalcomparing systems

• Good practice: always ask yourself “Are users likely to notice?”

• System A obtains an average P@1 of 0.05

• System B obtains an average P@1 of 0.10

• This is a 100% improvement.

• Are user’s likely to notice?

Monday, March 25, 13

Page 31: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

31

Ranked RetrievalP/R@K

• Advantages:

‣ easy to compute

‣ easy to interpret

• Disadvantages:

‣ the value of K has a huge impact on the metric

‣ the ranking above K is inconsequential

‣ how do we pick K?

Monday, March 25, 13

Page 32: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

32

Ranked Retrievalmotivation: average precision

• Ideally, we want the system to achieve high precision for varying values of K

• The metric average precision accounts for precision and recall without having to set K

Monday, March 25, 13

Page 33: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

33

Ranked Retrievalaverage precision

1. Go down the ranking one-rank-at-a-time

2. If the document at rank K is relevant, measure P@K

‣ proportion of top-K documents that are relevant

3. Finally, take the average of all P@K values

‣ the number of P@K values will equal the number of relevant documents

Monday, March 25, 13

Page 34: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

34

Ranked Retrievalaverage-precision

sum rank (K) ranking1.00 1 1.00 0.001.00 2 0.002.00 3 1.00 0.003.00 4 1.00 0.004.00 5 1.00 0.00

5.00 6 1.00 0.006.00 7 1.00 0.006.00 8 0.007.00 9 1.00 0.007.00 10 0.008.00 11 1.00 0.008.00 12 0.008.00 13 0.009.00 14 1.00 0.009.00 15 0.009.00 16 0.009.00 17 0.009.00 18 0.009.00 19 0.00

10.00 20 1.00 0.00total 10.00

Number of relevant documents for this query: 10

Monday, March 25, 13

Page 35: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

35

Ranked Retrievalaverage-precision

sum rank (K) ranking1.00 1 1.00 0.001.00 2 0.002.00 3 1.00 0.003.00 4 1.00 0.004.00 5 1.00 0.00

5.00 6 1.00 0.006.00 7 1.00 0.006.00 8 0.007.00 9 1.00 0.007.00 10 0.008.00 11 1.00 0.008.00 12 0.008.00 13 0.009.00 14 1.00 0.009.00 15 0.009.00 16 0.009.00 17 0.009.00 18 0.009.00 19 0.00

10.00 20 1.00 0.00total 10.00

1. Go down the ranking one-rank-at-a-time

2. If recall goes up, calculate P@K at that rank K

3. When recall = 1.0, average P@K values

Monday, March 25, 13

Page 36: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

36

Ranked Retrievalaverage-precision

sum rank (K) ranking R@K [email protected] 1 1.00 0.10 1.00 1.001.00 2 0.10 0.50 0.002.00 3 1.00 0.20 0.67 0.673.00 4 1.00 0.30 0.75 0.754.00 5 1.00 0.40 0.80 0.80

5.00 6 1.00 0.50 0.83 0.836.00 7 1.00 0.60 0.86 0.866.00 8 0.60 0.75 0.007.00 9 1.00 0.70 0.78 0.787.00 10 0.70 0.70 0.008.00 11 1.00 0.80 0.73 0.738.00 12 0.80 0.67 0.008.00 13 0.80 0.62 0.009.00 14 1.00 0.90 0.64 0.649.00 15 0.90 0.60 0.009.00 16 0.90 0.56 0.009.00 17 0.90 0.53 0.009.00 18 0.90 0.50 0.009.00 19 0.90 0.47 0.00

10.00 20 1.00 1.00 0.50 0.50total 10.00 average-precision 0.76

Monday, March 25, 13

Page 37: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

37

Ranked Retrievalaverage-precision

sum rank (K) ranking R@K [email protected] 1 1.00 0.10 1.00 1.002.00 2 1.00 0.20 1.00 1.003.00 3 1.00 0.30 1.00 1.004.00 4 1.00 0.40 1.00 1.005.00 5 1.00 0.50 1.00 1.00

6.00 6 1.00 0.60 1.00 1.007.00 7 1.00 0.70 1.00 1.008.00 8 1.00 0.80 1.00 1.009.00 9 1.00 0.90 1.00 1.00

10.00 10 1.00 1.00 1.00 1.0010.00 11 1.00 0.91 0.0010.00 12 1.00 0.83 0.0010.00 13 1.00 0.77 0.0010.00 14 1.00 0.71 0.0010.00 15 1.00 0.67 0.0010.00 16 1.00 0.63 0.0010.00 17 1.00 0.59 0.0010.00 18 1.00 0.56 0.0010.00 19 1.00 0.53 0.0010.00 20 1.00 0.50 0.00

total 10.00 average-precision 1.00

Monday, March 25, 13

Page 38: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

38

Ranked Retrievalaverage-precision

sum rank (K) ranking R@K [email protected] 1 0.00 0.00 0.000.00 2 0.00 0.00 0.000.00 3 0.00 0.00 0.000.00 4 0.00 0.00 0.000.00 5 0.00 0.00 0.00

0.00 6 0.00 0.00 0.000.00 7 0.00 0.00 0.000.00 8 0.00 0.00 0.000.00 9 0.00 0.00 0.000.00 10 0.00 0.00 0.001.00 11 1.00 0.10 0.09 0.092.00 12 1.00 0.20 0.17 0.173.00 13 1.00 0.30 0.23 0.234.00 14 1.00 0.40 0.29 0.295.00 15 1.00 0.50 0.33 0.336.00 16 1.00 0.60 0.38 0.387.00 17 1.00 0.70 0.41 0.418.00 18 1.00 0.80 0.44 0.449.00 19 1.00 0.90 0.47 0.47

10.00 20 1.00 1.00 0.50 0.50

total 10.00 average-precision 0.33

Monday, March 25, 13

Page 39: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

39

Ranked Retrievalaverage-precision

sum rank (K) ranking R@K [email protected] 1 1.00 0.10 1.00 1.001.00 2 0.10 0.50 0.002.00 3 1.00 0.20 0.67 0.673.00 4 1.00 0.30 0.75 0.754.00 5 1.00 0.40 0.80 0.80

5.00 6 1.00 0.50 0.83 0.836.00 7 1.00 0.60 0.86 0.866.00 8 0.60 0.75 0.007.00 9 1.00 0.70 0.78 0.787.00 10 0.70 0.70 0.008.00 11 1.00 0.80 0.73 0.738.00 12 0.80 0.67 0.008.00 13 0.80 0.62 0.009.00 14 1.00 0.90 0.64 0.649.00 15 0.90 0.60 0.009.00 16 0.90 0.56 0.009.00 17 0.90 0.53 0.009.00 18 0.90 0.50 0.009.00 19 0.90 0.47 0.00

10.00 20 1.00 1.00 0.50 0.50

total 10.00 average-precision 0.76

Monday, March 25, 13

Page 40: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

40

Ranked Retrievalaverage-precision

sum rank (K) ranking R@K [email protected] 1 1.00 0.10 1.00 1.002.00 2 1.00 0.20 1.00 1.002.00 3 0.20 0.67 0.003.00 4 1.00 0.30 0.75 0.754.00 5 1.00 0.40 0.80 0.80

5.00 6 1.00 0.50 0.83 0.836.00 7 1.00 0.60 0.86 0.866.00 8 0.60 0.75 0.007.00 9 1.00 0.70 0.78 0.787.00 10 0.70 0.70 0.008.00 11 1.00 0.80 0.73 0.738.00 12 0.80 0.67 0.008.00 13 0.80 0.62 0.009.00 14 1.00 0.90 0.64 0.649.00 15 0.90 0.60 0.009.00 16 0.90 0.56 0.009.00 17 0.90 0.53 0.009.00 18 0.90 0.50 0.009.00 19 0.90 0.47 0.00

10.00 20 1.00 1.00 0.50 0.50total 10.00 average-precision 0.79

swapped ranks 2 and 3

Monday, March 25, 13

Page 41: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

41

Ranked Retrievalaverage-precision

sum rank (K) ranking R@K [email protected] 1 1.00 0.10 1.00 1.001.00 2 0.10 0.50 0.002.00 3 1.00 0.20 0.67 0.673.00 4 1.00 0.30 0.75 0.754.00 5 1.00 0.40 0.80 0.80

5.00 6 1.00 0.50 0.83 0.836.00 7 1.00 0.60 0.86 0.866.00 8 0.60 0.75 0.007.00 9 1.00 0.70 0.78 0.787.00 10 0.70 0.70 0.008.00 11 1.00 0.80 0.73 0.738.00 12 0.80 0.67 0.008.00 13 0.80 0.62 0.009.00 14 1.00 0.90 0.64 0.649.00 15 0.90 0.60 0.009.00 16 0.90 0.56 0.009.00 17 0.90 0.53 0.009.00 18 0.90 0.50 0.009.00 19 0.90 0.47 0.00

10.00 20 1.00 1.00 0.50 0.50

total 10.00 average-precision 0.76

Monday, March 25, 13

Page 42: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

42

Ranked Retrievalaverage-precision

sum rank (K) ranking R@K [email protected] 1 1.00 0.10 1.00 1.001.00 2 0.10 0.50 0.002.00 3 1.00 0.20 0.67 0.673.00 4 1.00 0.30 0.75 0.754.00 5 1.00 0.40 0.80 0.80

5.00 6 1.00 0.50 0.83 0.836.00 7 1.00 0.60 0.86 0.867.00 8 1.00 0.70 0.88 0.887.00 9 0.70 0.78 0.007.00 10 0.70 0.70 0.008.00 11 1.00 0.80 0.73 0.738.00 12 0.80 0.67 0.008.00 13 0.80 0.62 0.009.00 14 1.00 0.90 0.64 0.649.00 15 0.90 0.60 0.009.00 16 0.90 0.56 0.009.00 17 0.90 0.53 0.009.00 18 0.90 0.50 0.009.00 19 0.90 0.47 0.00

10.00 20 1.00 1.00 0.50 0.50

total 10.00 average-precision 0.77

swapped ranks 8 and 9

Monday, March 25, 13

Page 43: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

43

Ranked Retrievalaverage precision

• Advantages:

‣ no need to choose K

‣ accounts for both precision and recall

‣ ranking mistakes at the top of the ranking are more influential

‣ ranking mistakes at the bottom of the ranking are still accounted for

• Disadvantages

‣ not quite as easy to interpret as P/R@K

Monday, March 25, 13

Page 44: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

44

Ranked RetrievalMAP: mean average precision

• So far, we’ve talked about average precision for a single query

• Mean Average Precision (MAP): average precision averaged across a set of queries

‣ yes, confusing. but, better than calling it “average average precision”!

‣ one of the most common metrics in IR evaluation

Monday, March 25, 13

Page 45: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

45

Ranked Retrievalprecision-recall curves

• In some situations, we want to understand the trade-off between precision and recall

• A precision-recall (PR) curve expresses precision as a function of recall

Monday, March 25, 13

Page 46: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

46

Ranked Retrievalprecision-recall curves: general idea

• Different tasks require different levels of recall

• Sometimes, the user wants a few relevant documents

• Other times, the user wants most of them

• Suppose a user wants some level of recall R

• The goal for the system is to minimize the number of false negatives the user must look at in order to achieve a level of recall R

Monday, March 25, 13

Page 47: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• False negative error: not retrieving a relevant document

‣ false negative errors affects recall

• False positive errors: retrieving a non-relevant document

‣ false positives errors affects precision

• If a user wants to avoid a certain level of false-negatives, what is the level of false-positives he/she must filter through?

47

Ranked Retrievalprecision-recall curves: general idea

Monday, March 25, 13

Page 48: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

48

Ranked Retrievalprecision-recall curves

sum1.00 1 1.00 0.001.00 2 0.002.00 3 1.00 0.003.00 4 1.00 0.004.00 5 1.00 0.00

5.00 6 1.00 0.006.00 7 1.00 0.006.00 8 0.007.00 9 1.00 0.007.00 10 0.008.00 11 1.00 0.008.00 12 0.008.00 13 0.009.00 14 1.00 0.009.00 15 0.009.00 16 0.009.00 17 0.009.00 18 0.009.00 19 0.00

10.00 20 1.00 0.00

sum1.00 1 1.00 0.001.00 2 0.002.00 3 1.00 0.003.00 4 1.00 0.004.00 5 1.00 0.00

5.00 6 1.00 0.006.00 7 1.00 0.006.00 8 0.007.00 9 1.00 0.007.00 10 0.008.00 11 1.00 0.008.00 12 0.008.00 13 0.009.00 14 1.00 0.009.00 15 0.009.00 16 0.009.00 17 0.009.00 18 0.009.00 19 0.00

10.00 20 1.00 0.00

• Assume 10 relevant documents for this query

• Suppose the user wants R = (1/10)

• What level of precision will the user observe?

Monday, March 25, 13

Page 49: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

49

Ranked Retrievalprecision-recall curves

sum1.00 1 1.00 0.001.00 2 0.002.00 3 1.00 0.003.00 4 1.00 0.004.00 5 1.00 0.00

5.00 6 1.00 0.006.00 7 1.00 0.006.00 8 0.007.00 9 1.00 0.007.00 10 0.008.00 11 1.00 0.008.00 12 0.008.00 13 0.009.00 14 1.00 0.009.00 15 0.009.00 16 0.009.00 17 0.009.00 18 0.009.00 19 0.00

10.00 20 1.00 0.00

sum1.00 1 1.00 0.001.00 2 0.002.00 3 1.00 0.003.00 4 1.00 0.004.00 5 1.00 0.00

5.00 6 1.00 0.006.00 7 1.00 0.006.00 8 0.007.00 9 1.00 0.007.00 10 0.008.00 11 1.00 0.008.00 12 0.008.00 13 0.009.00 14 1.00 0.009.00 15 0.009.00 16 0.009.00 17 0.009.00 18 0.009.00 19 0.00

10.00 20 1.00 0.00

• Assume 10 relevant documents for this query

• Suppose the user wants R = (2/10)

• What level of precision will the user observe?

Monday, March 25, 13

Page 50: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

50

Ranked Retrievalprecision-recall curves

sum1.00 1 1.00 0.001.00 2 0.002.00 3 1.00 0.003.00 4 1.00 0.004.00 5 1.00 0.00

5.00 6 1.00 0.006.00 7 1.00 0.006.00 8 0.007.00 9 1.00 0.007.00 10 0.008.00 11 1.00 0.008.00 12 0.008.00 13 0.009.00 14 1.00 0.009.00 15 0.009.00 16 0.009.00 17 0.009.00 18 0.009.00 19 0.00

10.00 20 1.00 0.00

sum1.00 1 1.00 0.001.00 2 0.002.00 3 1.00 0.003.00 4 1.00 0.004.00 5 1.00 0.00

5.00 6 1.00 0.006.00 7 1.00 0.006.00 8 0.007.00 9 1.00 0.007.00 10 0.008.00 11 1.00 0.008.00 12 0.008.00 13 0.009.00 14 1.00 0.009.00 15 0.009.00 16 0.009.00 17 0.009.00 18 0.009.00 19 0.00

10.00 20 1.00 0.00

• Assume 10 relevant documents for this query

• Suppose the user wants R = (10/10)

• What level of precision will the user observe?

Monday, March 25, 13

Page 51: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

51

Ranked Retrievalprecision-recall curves

sum rank (K) ranking R@K [email protected] 1 1.00 0.10 1.00 1.001.00 2 0.10 0.50 0.002.00 3 1.00 0.20 0.67 0.673.00 4 1.00 0.30 0.75 0.754.00 5 1.00 0.40 0.80 0.80

5.00 6 1.00 0.50 0.83 0.836.00 7 1.00 0.60 0.86 0.866.00 8 0.60 0.75 0.007.00 9 1.00 0.70 0.78 0.787.00 10 0.70 0.70 0.008.00 11 1.00 0.80 0.73 0.738.00 12 0.80 0.67 0.008.00 13 0.80 0.62 0.009.00 14 1.00 0.90 0.64 0.649.00 15 0.90 0.60 0.009.00 16 0.90 0.56 0.009.00 17 0.90 0.53 0.009.00 18 0.90 0.50 0.009.00 19 0.90 0.47 0.00

10.00 20 1.00 1.00 0.50 0.50

Monday, March 25, 13

Page 52: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Precision

52

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

Ranked Retrievalprecision-recall curves

Monday, March 25, 13

Page 53: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Precision

53

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

Wrong! When plotting a PR curve, we use the best

precision for a level of recall R or greater!

Ranked Retrievalprecision-recall curves

Monday, March 25, 13

Page 54: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Precision

54

Ranked Retrievalprecision-recall curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

Monday, March 25, 13

Page 55: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• For a single query, a PR curve looks like a step-function

• For multiple queries, we can average these curves

‣ Average the precision values for different values of recall (e.g., from 0.01 to 1.0 in increments of 0.01)

• This forms a smoother function

55

Ranked Retrievalprecision-recall curves

Monday, March 25, 13

Page 56: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

56

Ranked Retrievalprecision-recall curves

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

• PR curves can be averaged across multiple queries

Monday, March 25, 13

Page 57: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

57

Ranked Retrievalprecision-recall curves

Monday, March 25, 13

Page 58: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

58

Ranked Retrievalprecision-recall curves

which system is better?

Monday, March 25, 13

Page 59: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

59

Ranked Retrievalprecision-recall curves

for what type of task would the blue system be

better?

Monday, March 25, 13

Page 60: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Prec

ision

60

Ranked Retrievalprecision-recall curves

for what type of task would the blue system be

better?

Monday, March 25, 13

Page 61: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• In some retrieval tasks, we really want to focus on precision at the top of the ranking

• A classic example is web-search!

‣ users rarely care about recall

‣ users rarely navigate beyond the first page of results

‣ users may not even look at results below the “fold”

• Are any of the metrics we’ve seen so far appropriate for web-search?

61

Ranked Retrievaldiscounted-cumulative gain

Monday, March 25, 13

Page 62: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

62

Ranked Retrievaldiscounted-cumulative gain

Monday, March 25, 13

Page 63: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• We could potentially evaluate using P@K with several small values of K

• But, this has some limitations

• What are they?

63

Ranked Retrievaldiscounted-cumulative gain

Monday, March 25, 13

Page 64: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• Which retrieval is better?

64

Ranked Retrievaldiscounted-cumulative gain

Monday, March 25, 13

Page 65: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• Evaluation based on P@K can be too coarse

65

Ranked Retrievaldiscounted-cumulative gain

K=1

K=5

K=10

K=1

K=5

K=10

Monday, March 25, 13

Page 66: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• P@K (and all the metrics we’ve seen so far) assume binary relevance

66

Ranked Retrievaldiscounted-cumulative gain

K=1

K=5

K=10

K=1

K=5

K=10

Monday, March 25, 13

Page 67: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• DCG: discounted cumulative gain

• Assumptions:

‣ There are more than two levels of relevance (e.g., perfect, excellent, good, fair, bad)

‣ A relevant document’s usefulness to a user decreases rapidly with rank (more rapidly than linearly)

67

Ranked Retrievaldiscounted-cumulative gain

Monday, March 25, 13

Page 68: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

68

Ranked Retrievaldiscounted-cumulative gain

• Let RELi be the relevance associated with the document at rank i

‣ perfect ➔ 4

‣ excellent ➔ 3

‣ good ➔ 2

‣ fair ➔ 1

‣ bad ➔ 0

Monday, March 25, 13

Page 69: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• DCG: discounted cumulative gain

69

Ranked Retrievaldiscounted-cumulative gain

DCG@K =K

Âi=1

RELilog

2

(max(i, 2))

Monday, March 25, 13

Page 70: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

0

0.2

0.4

0.6

0.8

1

1.2

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Rank

Disco

unt

70

Ranked Retrievaldiscounted-cumulative gain

1

log2(i)

a relevant document’s usefulness decreases

exponentially with rank

Monday, March 25, 13

Page 71: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

Ranked Retrievaldiscounted-cumulative gain

71

rank (i) REL_i discount factor gain DCG_i1 4 1.00 4.00 4.002 3 1.00 3.00 7.003 4 0.63 2.52 9.524 2 0.50 1.00 10.525 0 0.43 0.00 10.526 0 0.39 0.00 10.527 0 0.36 0.00 10.528 1 0.33 0.33 10.869 1 0.32 0.32 11.1710 0 0.30 0.00 11.17

This is given!

the result at rank 1 is perfectthe result at rank 2 is excellentthe result at rank 3 is perfect

...the result at rank 10 is bad

DCG@K =K

Âi=1

RELilog

2

(max(i, 2))

Monday, March 25, 13

Page 72: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

rank (i) REL_i discount factor gain1 4 1.00 4.002 3 1.00 3.003 4 0.63 2.524 2 0.50 1.005 0 0.43 0.006 0 0.39 0.007 0 0.36 0.008 1 0.33 0.339 1 0.32 0.3210 0 0.30 0.00

72

Ranked Retrievaldiscounted-cumulative gain

Each rank is associated with a discount factor

rank 1 is a special case!

DCG@K =K

Âi=1

RELilog

2

(max(i, 2))

1

log

2

(max(i, 2))

Monday, March 25, 13

Page 73: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

rank (i) REL_i discount factor gain DCG_i1 4 1.00 4.00 4.002 3 1.00 3.00 7.003 4 0.63 2.52 9.524 2 0.50 1.00 10.525 0 0.43 0.00 10.526 0 0.39 0.00 10.527 0 0.36 0.00 10.528 1 0.33 0.33 10.869 1 0.32 0.32 11.1710 0 0.30 0.00 11.17

73

Ranked Retrievaldiscounted-cumulative gain

multiply RELiby the discountfactor associated with the rank!

DCG@K =K

Âi=1

RELilog

2

(max(i, 2))

Monday, March 25, 13

Page 74: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

rank (i) REL_i discount factor gain DCG_i1 4 1.00 4.00 4.002 3 1.00 3.00 7.003 4 0.63 2.52 9.524 2 0.50 1.00 10.525 0 0.43 0.00 10.526 0 0.39 0.00 10.527 0 0.36 0.00 10.528 1 0.33 0.33 10.869 1 0.32 0.32 11.1710 0 0.30 0.00 11.17

74

Ranked Retrievaldiscounted-cumulative gain

DCG@K =K

Âi=1

RELilog

2

(max(i, 2))

Monday, March 25, 13

Page 75: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

rank (i) REL_i discount factor gain DCG_i1 4 1.00 4.00 4.002 3 1.00 3.00 7.003 4 0.63 2.52 9.524 2 0.50 1.00 10.525 0 0.43 0.00 10.526 0 0.39 0.00 10.527 0 0.36 0.00 10.528 1 0.33 0.33 10.869 1 0.32 0.32 11.1710 0 0.30 0.00 11.17

75

Ranked Retrievaldiscounted-cumulative gain

DCG10 = 11.17

Monday, March 25, 13

Page 76: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

rank (i) REL_i discount factor gain DCG_i1 3 1.00 3.00 3.002 3 1.00 3.00 6.003 4 0.63 2.52 8.524 2 0.50 1.00 9.525 0 0.43 0.00 9.526 0 0.39 0.00 9.527 0 0.36 0.00 9.528 1 0.33 0.33 9.869 1 0.32 0.32 10.1710 0 0.30 0.00 10.17

76

Ranked Retrievaldiscounted-cumulative gain

changed top result from perfect instead of excellent

DCG10 = 11.17

Monday, March 25, 13

Page 77: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

rank (i) REL_i discount factor gain DCG_i1 4 1.00 4.00 4.002 3 1.00 3.00 7.003 4 0.63 2.52 9.524 2 0.50 1.00 10.525 0 0.43 0.00 10.526 0 0.39 0.00 10.527 0 0.36 0.00 10.528 1 0.33 0.33 10.869 1 0.32 0.32 11.1710 3 0.30 0.90 12.08

77

Ranked Retrievaldiscounted-cumulative gain

changed 10th result from bad to excellent

DCG10 = 11.17

Monday, March 25, 13

Page 78: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• DCG is not ‘bounded’

• In other words, it ranges from zero to .....

• Makes it problematic to average across queries

• NDCG: normalized discounted-cumulative gain

• “Normalized” is a fancy way of saying, we change it so that it ranges from 0 to 1

78

Ranked Retrievalnormalized discounted-cumulative gain

Monday, March 25, 13

Page 79: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• NDCGi: normalized discounted-cumulative gain

• For a given query, measure DCGi

• Then, divide this DCGi value by the best possible DCGi for that query

• Measure DCGi for the best possible ranking for a given value i

79

Ranked Retrievalnormalized discounted-cumulative gain

Monday, March 25, 13

Page 80: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• Given: a query has two 4’s, one 3, and the rest are 0’s

• Question: What is the best possible ranking for i = 1

• All these are equally good:

‣ 4, 4, 3, ....

‣ 4, 3, 4, ....

‣ 4, 0, 0, ....

‣ ... anything with a 4 as the top-ranked result

80

Ranked Retrievalnormalized discounted-cumulative gain

Monday, March 25, 13

Page 81: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• Given: the query has two 4’s, one 3, and the rest are 0’s

• Question: What is the best possible ranking for i = 2

• All these are equally good:

‣ 4, 4, 3, ....

‣ 4, 4, 0, ....

81

Ranked Retrievalnormalized discounted-cumulative gain

Monday, March 25, 13

Page 82: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• Given: the query has two 4’s, one 3, and the rest are 0’s

• Question: What is the best possible ranking for i = 3

• All these are equally good:

‣ 4, 4, 3, ....

82

Ranked Retrievalnormalized discounted-cumulative gain

Monday, March 25, 13

Page 83: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• NDCGi: normalized discounted-cumulative gain

• For a given query, measure DCGi

• Then, divide this DCGi value by the best possible DCGi for that query

• Measure DCGi for the best possible ranking for a given value i

83

Ranked Retrievalnormalized discounted-cumulative gain

Monday, March 25, 13

Page 84: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• set-retrieval evaluation: we want to evaluate the set of documents retrieved by the system, without considering the ranking

• ranked-retrieval evaluation: we want to evaluate the ranking of documents returned by the system

84

Metric Review

Monday, March 25, 13

Page 85: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• precision: the proportion of retrieved documents that are relevant

• recall: the proportion of relevant documents that are retrieved

• f-measure: harmonic-mean of precision and recall

‣ a difficult metric to “cheat” by getting very high precision and abysmal recall (and vice-versa)

85

Metric Reviewset-retrieval evaluation

Monday, March 25, 13

Page 86: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

• P@K: precision under the assumption that the top-K results is the ‘set’ retrieved

• R@K: recall under the assumption that the top-K results is the ‘set’ retrieved

• average-precision: considers precision and recall and focuses mostly on the top results

• DCG: ignores recall, considers multiple levels of relevance, and focuses very must on the top ranks

• NDCG: trick to make DCG range between 0 and 1

86

Metric Reviewranked-retrieval evaluation

Monday, March 25, 13

Page 87: Jaime Arguello INLS 509: Information Retrieval1 Evaluation Metrics Jaime Arguello INLS 509: Information Retrieval jarguell@email.unc.edu March 25, 2013 Monday, March 25, 13

87

Which Metric Would You Use?

Monday, March 25, 13