information retrieval review lbsc 796/infm 718r. structure of ir systems ir process model system...

Information Retrieval Review

LBSC 796/INFM 718R

Structure of IR Systems

• IR process model• System architecture• Information needs

– Visceral, conscious, formalized, compromised

• Utility vs. relevance• Known item vs. ad hoc search

Supporting the Search Process

SourceSelection

Search

Query

Selection

Ranked List

Examination

Document

Delivery

Document

QueryFormulation

IR System

Indexing Index

Acquisition Collection

Relevance• Relevance relates a topic and a document

– Duplicates are equally relevant, by definition– Constant over time and across users

• Pertinence relates a task and a document– Accounts for quality, complexity, language, …

• Utility relates a user and a document– Accounts for prior knowledge

Taylor’s Model of Question Formation

Q1 Visceral Need

Q2 Conscious Need

Q3 Formalized Need

Q4 Compromised Need (Query)

En

d-u

ser

Sea

rch

Interm

ediated

Search

Evidence from Contentand Ranked Retrieval

• Inverted indexing– Postings, postings file

• Bag of terms– Segmentation, phrases, stemming, stopwords

• Boolean retrieval• Vector space ranked retrieval

– TF, IDF, length normalization, BM25

• Blind relevance feedback

An “Inverted Index”

quick

brown

fox

over

lazy

dog

back

now

time

all

good

men

come

jump

aid

their

party

00110000010010110

01001001001100001

Term

Doc

1D

oc 2

00110110110010100

11001001001000001

Doc

3D

oc 4

00010110010010010

01001001000101001

Doc

5D

oc 6

00110010010010010

10001001001111000

Doc

7D

oc 8

A

B

C

FD

GJLMNOPQ

T

AIALBABR

THTI

4, 82, 4, 61, 3, 7

1, 3, 5, 72, 4, 6, 8

3, 53, 5, 7

2, 4, 6, 83

1, 3, 5, 72, 4, 82, 6, 8

1, 3, 5, 7, 86, 81, 3

1, 5, 72, 4, 6

PostingsTerm Index

Postings List

Postings List

Postings List

A Partial Solution: TF*IDF• High TF is evidence of meaning• Low DF is evidence of term importance

– Equivalently high “IDF”• Multiply them to get a “term weight” • Add up the weights for each query term

DF

NTFw

jiTF

iNDF

N

jiji

ji

logThen

document in appears term timesofnumber thebe Let

rmcontain te documents theof Let

documents ofnumber total thebe Let

,,

,

Cosine Normalization Example

0.29

0.37

0.53

0.13

0.62

0.77

0.57

0.14

0.19

0.79

0.05

0.71

1 2 3

0.69

0.44

0.57

4

456

3

1

31

6

5343

71

nuclear

fallout

siberia

contaminated

interesting

complicated

information

retrieval

2

1 2 3

2

32

4

4

0.50

0.63

0.90

0.13

0.60

0.75

1.51

0.38

0.50

2.11

0.13

1.20

1 2 3

0.60

0.38

0.50

4

0.3010.1250.125

0.125

0.6020.301

0.000

0.602

idfi

1.70 0.97 2.67 0.87Length

tf ,i jwi j, wi j,

query: contaminated retrieval, Result: 2, 4, 1, 3 (compare to 2, 3, 1, 4)

Interaction

• Query formulation vs. Query by example• Summarization

– Indicative vs. informative

• Clustering• Visualization

– Projection, starfield, contour maps

Evaluation

• Criteria– Effectiveness, efficiency, usability

• Measures of effectiveness– Recall– Precision– F-measure– Mean Average Precision

• User studies

Set-Based Effectiveness Measures

• Precision– How much of what was found is relevant?

•Often of interest, particularly for interactive searching

• Recall– How much of what is relevant was found?

•Particularly important for law, patents, and medicine

€

Recall=Relevant Retrieved

Relevant

€

Precision=Relevant Retrieved

Retrieved

Accuracy and exhaustiveness

Relevant RetrievedRelevant +Retrieved

Not Relevant + Not Retrieved

Space of all documents

Mean Average Precision– Average of precision at each retrieved relevant

document– Relevant documents not retrieved contribute

zero to score

= relevant document

1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10

5/11 5/12 5/13 5/14 5/15 6/16 6/17 6/18 6/19 4/20

Precision

Precision

Hits 1-10

Hits 11-20

Assume total of 14 relevant documents: 8 relevant documents not retrieved contribute eight zeros

MAP = .2307

Blair and Maron (1985)• A classic study of retrieval effectiveness

– Earlier studies used unrealistically small collections

• Studied an archive of documents for a lawsuit– 40,000 documents, ~350,000 pages of text– 40 different queries– Used IBM’s STAIRS full-text system

• Approach:– Lawyers wanted at least 75% of all relevant documents– Precision and recall evaluated only after the lawyers

were satisfied with the resultsDavid C. Blair and M. E. Maron. (1984) An Evaluation of Retrieval Effectiveness for a

Full-Text Document-Retrieval System. Communications of the ACM, 28(3), 289--299.

Blair and Maron’s Results• Mean precision: 79%• Mean recall: 20% (!!)• Why recall was low?

– Users can’t anticipate terms used in relevant documents

– Differing technical terminology– Slang, misspellings

• Other findings:– Searches by both lawyers had similar performance– Lawyer’s recall was not much different from paralegal’s

“accident” might be referred to as “event”, “incident”, “situation”, “problem,” …

Web Search

• Crawling• PageRank• Anchor text• Deep Web

– i.e., database-generated content

Evidence from Behavior

• Implicit feedback• Privacy risks• Recommender systems

Evidence from Metadata

• Standards– e.g., Dublin Core

• Controlled vocabulary• Text classification• Information extraction

Filtering

• Retrieval– Information needs differ for stable collection

• Filtering– Collection differs for stable information needs

Multimedia IR

• Image retrieval– Color histograms

• Video: Motion detection– Camera, object

• Video: Shot structure– Boundary detection, classification

• Video: OCR– Closed caption, on screen caption, scene text

information retrieval review lbsc 796/infm 718r. structure of ir systems ir process model system...

Documents

query term slide

r slide

term doc

prior knowledge slide

length query

relevance relevance

contaminated retrieval

document accounts