information retrieval review lbsc 796/infm 718r. structure of ir systems ir process model system...
Post on 19-Dec-2015
218 views
TRANSCRIPT
Structure of IR Systems
• IR process model• System architecture• Information needs
– Visceral, conscious, formalized, compromised
• Utility vs. relevance• Known item vs. ad hoc search
Supporting the Search Process
SourceSelection
Search
Query
Selection
Ranked List
Examination
Document
Delivery
Document
QueryFormulation
IR System
Indexing Index
Acquisition Collection
Relevance• Relevance relates a topic and a document
– Duplicates are equally relevant, by definition– Constant over time and across users
• Pertinence relates a task and a document– Accounts for quality, complexity, language, …
• Utility relates a user and a document– Accounts for prior knowledge
Taylor’s Model of Question Formation
Q1 Visceral Need
Q2 Conscious Need
Q3 Formalized Need
Q4 Compromised Need (Query)
En
d-u
ser
Sea
rch
Interm
ediated
Search
Evidence from Contentand Ranked Retrieval
• Inverted indexing– Postings, postings file
• Bag of terms– Segmentation, phrases, stemming, stopwords
• Boolean retrieval• Vector space ranked retrieval
– TF, IDF, length normalization, BM25
• Blind relevance feedback
An “Inverted Index”
quick
brown
fox
over
lazy
dog
back
now
time
all
good
men
come
jump
aid
their
party
00110000010010110
01001001001100001
Term
Doc
1D
oc 2
00110110110010100
11001001001000001
Doc
3D
oc 4
00010110010010010
01001001000101001
Doc
5D
oc 6
00110010010010010
10001001001111000
Doc
7D
oc 8
A
B
C
FD
GJLMNOPQ
T
AIALBABR
THTI
4, 82, 4, 61, 3, 7
1, 3, 5, 72, 4, 6, 8
3, 53, 5, 7
2, 4, 6, 83
1, 3, 5, 72, 4, 82, 6, 8
1, 3, 5, 7, 86, 81, 3
1, 5, 72, 4, 6
PostingsTerm Index
Postings List
Postings List
Postings List
A Partial Solution: TF*IDF• High TF is evidence of meaning• Low DF is evidence of term importance
– Equivalently high “IDF”• Multiply them to get a “term weight” • Add up the weights for each query term
DF
NTFw
jiTF
iNDF
N
jiji
ji
logThen
document in appears term timesofnumber thebe Let
rmcontain te documents theof Let
documents ofnumber total thebe Let
,,
,
Cosine Normalization Example
0.29
0.37
0.53
0.13
0.62
0.77
0.57
0.14
0.19
0.79
0.05
0.71
1 2 3
0.69
0.44
0.57
4
456
3
1
31
6
5343
71
nuclear
fallout
siberia
contaminated
interesting
complicated
information
retrieval
2
1 2 3
2
32
4
4
0.50
0.63
0.90
0.13
0.60
0.75
1.51
0.38
0.50
2.11
0.13
1.20
1 2 3
0.60
0.38
0.50
4
0.3010.1250.125
0.125
0.6020.301
0.000
0.602
idfi
1.70 0.97 2.67 0.87Length
tf ,i jwi j, wi j,
query: contaminated retrieval, Result: 2, 4, 1, 3 (compare to 2, 3, 1, 4)
Interaction
• Query formulation vs. Query by example• Summarization
– Indicative vs. informative
• Clustering• Visualization
– Projection, starfield, contour maps
Evaluation
• Criteria– Effectiveness, efficiency, usability
• Measures of effectiveness– Recall– Precision– F-measure– Mean Average Precision
• User studies
Set-Based Effectiveness Measures
• Precision– How much of what was found is relevant?
•Often of interest, particularly for interactive searching
• Recall– How much of what is relevant was found?
•Particularly important for law, patents, and medicine
€
Recall=Relevant Retrieved
Relevant
€
Precision=Relevant Retrieved
Retrieved
Accuracy and exhaustiveness
Relevant RetrievedRelevant +Retrieved
Not Relevant + Not Retrieved
Space of all documents
Mean Average Precision– Average of precision at each retrieved relevant
document– Relevant documents not retrieved contribute
zero to score
= relevant document
1/1 1/2 1/3 1/4 2/5 3/6 3/7 4/8 4/9 4/10
5/11 5/12 5/13 5/14 5/15 6/16 6/17 6/18 6/19 4/20
Precision
Precision
Hits 1-10
Hits 11-20
Assume total of 14 relevant documents: 8 relevant documents not retrieved contribute eight zeros
MAP = .2307
Blair and Maron (1985)• A classic study of retrieval effectiveness
– Earlier studies used unrealistically small collections
• Studied an archive of documents for a lawsuit– 40,000 documents, ~350,000 pages of text– 40 different queries– Used IBM’s STAIRS full-text system
• Approach:– Lawyers wanted at least 75% of all relevant documents– Precision and recall evaluated only after the lawyers
were satisfied with the resultsDavid C. Blair and M. E. Maron. (1984) An Evaluation of Retrieval Effectiveness for a
Full-Text Document-Retrieval System. Communications of the ACM, 28(3), 289--299.
Blair and Maron’s Results• Mean precision: 79%• Mean recall: 20% (!!)• Why recall was low?
– Users can’t anticipate terms used in relevant documents
– Differing technical terminology– Slang, misspellings
• Other findings:– Searches by both lawyers had similar performance– Lawyer’s recall was not much different from paralegal’s
“accident” might be referred to as “event”, “incident”, “situation”, “problem,” …
Evidence from Metadata
• Standards– e.g., Dublin Core
• Controlled vocabulary• Text classification• Information extraction
Filtering
• Retrieval– Information needs differ for stable collection
• Filtering– Collection differs for stable information needs