chapter 3 retrieval evaluation
DESCRIPTION
Chapter 3 Retrieval Evaluation. Modern Information Retrieval Ricardo Baeza-Yates Berthier Ribeiro-Neto. Hsu Yi-Chen, NCU MIS 88423043. Outline. Introduction Retrieval Performance Evaluation Recall and precision Alternative measures Reference Collections TREC Collection - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/1.jpg)
Chapter 3Retrieval Evaluation
Modern Information RetrievalRicardo Baeza-Yates
Berthier Ribeiro-Neto
Hsu Yi-Chen, NCU MIS88423043
![Page 2: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/2.jpg)
Outline
IntroductionRetrieval Performance Evaluation Recall and precision Alternative measures
Reference Collections TREC Collection CACM&ISI Collection CF Collection
Trends and Research Issues
![Page 3: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/3.jpg)
Introduction
Type of evaluation Functional analysis phase, and Error
analysis phase Performance evaluation
Performance of the IR system Performance evaluation
Response time/space required Retrieval performance evaluation
The evaluation of how precise is the answer set
![Page 4: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/4.jpg)
Retrieval performance evaluation for IR system
Goodness of retrieval strategy S =the similarity between
Set of retrieval documents by SSet of relevant documents provided by
specialistsquantified by Evaluation measure
![Page 5: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/5.jpg)
Retrieval Performance Evaluation(Cont.)
評估以 batch query 為主的 IR 系統
collection
Relevant DocsIn Answer Set
|Ra|
Relevant Docs|R|
Answer Set|A|
Recall=|Ra|/|R|
Precision=|Ra|/|A|
Sorted by relevance
![Page 6: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/6.jpg)
Precision versus recall curve
Rq={d3,d5,d9,d25,d39,d44,d56,d89,d123}
Ranking for query q:
1.d123*2.d843.d56*4.d65.d8
6.d9*7.d5118.d1299.d18710.d25*
11.d3812.d4813.d25014.d1115.d3*
•100% at10%•66% at 20%•50% at 30%•Usally based on 11 standard recall levels:0%,10%,..100%
![Page 7: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/7.jpg)
Precision versus recall curve
For a single query
Fig3.2
![Page 8: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/8.jpg)
計算多個 query 的平均效能
P(r)= Σ(Pi(r)/Nq)P(r)=average precision at the recall levalNq=number of queries used
Pi(r)=the precision at recall level r for the i-th query
i=1
i=Nq
![Page 9: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/9.jpg)
Interpolated precision
Rq={d3,d56,d129}Let rj,j={0,1,2,…,10},be a reference to the j-th standard recall levelP(rj)=max ri≦ r≦ rj+1P(r)
![Page 10: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/10.jpg)
兩個不同演算法的 Average recall versus precision figure
![Page 11: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/11.jpg)
Single Value Summaries之前的 average precision versus recall: 比較 retrieval algorithms over a set of example queri
esBut! Individual query 的 performance 也很重要,因為 : Average precision 可能會隱藏演算法中不正常的部分 可能需要知道 , 兩個演算法中,對某特定 query 的 perf
ormance 為何 解決方法 : 考慮每一個 query 的 single precision value The single value should be interpreted as a summar
y of the corresponding precision versus recall curve 通常 ,single value summary 被用來當作某一個 recall l
evel 的 precision 值
![Page 12: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/12.jpg)
Average Precision at Seen Relevant Documents
Averaging the precision figures obtained after each new relevant document is observed.F3.2,(1+0.66+0.5+0.4+0.3)/5=0.57此方法對於很快找到相關文件的系統是相當有利的 ( 相關文件被排在越前面 ,precision 值越高 )
![Page 13: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/13.jpg)
R-Precision
Computing the precision at the R-th position in the ranking( 在 R 篇文章中出現相關文章數目的比例 )R:the total number of relevant documents of the current query(total number in Rq)Fig3.2:R=10,value=0.4Fig3.3,R=3,value=0.33易於觀察每一個單一 query 的演算法 performance
![Page 14: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/14.jpg)
Precision Histograms
利用長條圖比較兩個 query 的 R-precision 值RPA/B(i )=RPA(i )-RPB(i )RPA(i),RPB(i):R-precision value of A,B for i-th queryCompare the retrieval performance history of two algorithms through visual inspection
![Page 15: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/15.jpg)
Precision Histograms(cont.)
![Page 16: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/16.jpg)
Summary Table Statistics
將所有 query 相關的 single value summary 放在 table 中 如 the number of queries , total number of documents retrieved by all quer
ies, total number of relevant documents were effect
ively retrieved when all queries are considered Total number of relevant documents retrieved b
y all queries…
![Page 17: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/17.jpg)
Precision and Recall 的適用性
Maximum recall 值的產生,需要知道所有文件相關的背景知識Recall and precision 是相對的測量方式,兩者要合併使用比較適合。Measures which quantify the informativeness of the retrieval process might now be more appropriateRecall and precision are easy to define when a linear ordering of the retrieved documents is enforced
![Page 18: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/18.jpg)
Alternative MeasuresThe Harmonic Mean , 介於 0,1
The E Measure- 加入喜好比重
b=1,E(j)=F(j) b>1,more interested in precision b<1,more interested in recall
2
r(j)1
P(j)
1+
F(j)=
1+b2
r(j)b2
P(j)
1+
E(j)=1-
![Page 19: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/19.jpg)
User-Oriented Measure假設: Query 與使用者有相關 , 不同使用者有不同的 relevant docs coverage=|Rk|/|U| Novelty=|Ru|/|Ru|+|Rk|
Coverage 越高 , 系統找到使用者期望的文件越多
Noverlty 越高 , 系統找到許多使用者之前不知道相關的文件越多
![Page 20: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/20.jpg)
User-Oriented Measure(cont.)
relative recall: 系統找到的相關文章數佔使用者預期找到的文章數比例 (|Ru|+|Rk|)/ |U|
Recall effort: 使用者期望找到的相關文章數佔符合使用者期望的相關文章數 (the number of documents examined in an attempt to find the expected relevant documents) |U|/|Rk|
![Page 21: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/21.jpg)
Reference Collection
用來作為評估 IR 系統 reference test collections TIPSTER/TREC: 量大,實驗用 CACM,ISI: 歷史意義 Cystic Fibrosis :small collections,relevant
documents 由專家研討後產生
![Page 22: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/22.jpg)
IR system 遇到的批評
Lacks a solid formal framework as a basic foundation 無解 ! 一個文件是否與查詢相關,是相當主觀的 !
Lacks robust and consistent testbeds and benchmarks 較早,發展實驗性質的小規模測試資料 1990 後, TREC 成立,蒐集上萬文件,提供給研究
團體作 IR 系統評量之用
![Page 23: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/23.jpg)
TREC(Text REtrieval Conference)
Initiated under the National Institute of Standards and Technology(NIST)Goals: Providing a large test collection Uniform scoring procedures Forum
7th TREC conference in 1998: Document collection:test collections,example
information requests(topics),relevant docs The benchmarks tasks
![Page 24: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/24.jpg)
The Documents Collection
由 SGML 編輯<doc>
<docno>WSJ880406-0090</docno>
<hl>AT&T Unveils Services to Upgrade Phone Networks Under Global Plan</hl>
<author>Janet GuyonWSJ Staff)</author>
<dateline>New York</dateline>
<text>
American Telephone & Telegrapj Co. introduced the first of a newgeneration of phone service with broad…
</text>
</doc>
<doc>
<docno>WSJ880406-0090</docno>
<hl>AT&T Unveils Services to Upgrade Phone Networks Under Global Plan</hl>
<author>Janet GuyonWSJ Staff)</author>
<dateline>New York</dateline>
<text>
American Telephone & Telegrapj Co. introduced the first of a newgeneration of phone service with broad…
</text>
</doc>
![Page 25: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/25.jpg)
The Example Information Requests(Topics)
用自然語言將資訊需求描述出來Topic number: 給不同類型的 topics
<top>
<num> Number:168
<title>Topic:Financing AMTRAK
<desc>Description:
…..
<nar>Narrative:A …..
</top>
![Page 26: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/26.jpg)
The relevant Documents for Each Example Information Request
The set of relevant documents for each topic obtained from a pool of possible relevant documentsPool: 由數各參與的 IR 系統中所找到的相關文件,依照相關性排序後的前 K個文章。K通常為 100最後透過人工鑑定,判斷是否為相關文件->pooling method 相關文件有數個組合的 pool取得 不在 pool內的文件視為不相關文件
![Page 27: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/27.jpg)
The (Benchmark)Tasks at the TREC Conferences
ad hoc task: Receive new requests and execute them on
a pre-specified document collection
routing task Receive test info. Requests,two document
collections first doc:training and tuning retrieval
algorithm Second doc:testing the tuned retrieval
algorithm
![Page 28: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/28.jpg)
Other tasks:
*ChineseFiltering Interactive*NLP(natural language procedure)Cross languagesHigh precisionSpoken document retrievalQuery Task(TREC-7)
![Page 29: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/29.jpg)
Evaluation Measures at the TREC Conferences
Summary table statistics Recall-precisionDocument level averages*Average precision histogram
![Page 30: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/30.jpg)
The CACM CollectionSmall collections about computer science literatureText of docstructured subfields
word stems from the title and abstract sections Categories direct references between articles:a list of pairs of documents[d
a,db] Bibliographic coupling connections:a list of triples[d1,d2,ncited] Number of co-citations for each pair of articles[d1,d2,nciting]
A unique environment for testing retrieval algorithms which are based on information derived from cross-citing patterns
![Page 31: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/31.jpg)
The ISI Collection
ISI 的 test collection 是由之前在 ISI(Institute of Scientific Information) 的 Small組合而成這些文件大部分是由當初 Small 計畫中有關 cross-citation study 中挑選出來支持有關於 terms和 cross-citation patterns 的相似性研究
![Page 32: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/32.jpg)
The Cystic Fibrosis Collection
有關於“囊胞性纖維症”的文件Topics和相關文件由具有此方面在臨床或研究的專家所產生Relevance scores 0:non-relevance 1:marginal relevance 2:high relevance
![Page 33: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/33.jpg)
Characteristics of CF collection
Relevance score 均由專家給定Good number of information requests(relative to the collection size) The respective query vectors present ove
rlap among themselves 利用之前的 query增加檢索效率
![Page 34: Chapter 3 Retrieval Evaluation](https://reader033.vdocuments.site/reader033/viewer/2022061520/56813ba6550346895da4d7a8/html5/thumbnails/34.jpg)
Trends and Research Issues
Interactive user interface 一般認為 feedback 的檢索可以改善效率 如何決定此情境下的評估方式 (Evaluation
measures)?其它有別於 precise,recall 的評估方式研究