enhancing biomedical text rankers by term proximity information 劉瑞瓏...
TRANSCRIPT
![Page 1: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/1.jpg)
Enhancing Biomedical Text Rankers by
Term Proximity Information
劉瑞瓏慈濟大學醫學資訊學系
2012/06/13
![Page 2: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/2.jpg)
Outline
• Background– Text ranking– Biomedical information needs
• An approach to enhancing text rankers in the biomedical domain
• Evaluation
• Conclusion
2
![Page 3: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/3.jpg)
Research Background
3
![Page 4: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/4.jpg)
Text Ranking• Goal
– Given a query q and a set T of texts retrieved for q, ranking those texts (in T) according to their degrees of relevance to q
• Motivation– Reducing information overload, since T is often
quite huge, even a smart search engine is used– Text ranking is a key issue in information
retrieval, and often a “secret” component for search engines
4
![Page 5: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/5.jpg)
An Example Ranker
5
![Page 6: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/6.jpg)
Biomedical Information Need
• Biomedical research requires relevant evidences in the huge and ever-growing biomedical literature
• Retrieval of the evidences requires a system that – Accepts a natural language query for a biomedical
information need, and – Ranks relevant texts higher for access or processing
6
![Page 7: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/7.jpg)
An Example
• Query: urinary tract infection, criteria for treatment and admission (from OHSUMED) – A disease as the target concept (i.e., urinary tract infection)
– Two concepts about the scenario of the information need (i.e., treatment and admission)
• Neither special nor related to any disease
7
![Page 8: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/8.jpg)
Contextual Completeness
• Biomedical queries need to be well-formed, and so call for a retrieval system that considers contextual completeness of each query concept t in the text d– Contextual completeness of t in d is the extent
to which the query concepts other than t appear in nearby areas in d
8
![Page 9: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/9.jpg)
An Example
9
• In children with an acute febrile illness, what is the efficacy of single medication therapy with acetaminophen or ibuprofen in reducing fever?
[From Lin & Demner-Fushman, 2006]
PICO
Task
Answer
Strength
![Page 10: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/10.jpg)
An Approach to Improving Rankers for Biomedical Info Needs
10
![Page 11: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/11.jpg)
11
Goals
• An approach PRE (Proximity-based Ranker Enhancer) that – Measures contextual completeness of query
concepts appearing in a nearby area in the text– Serves as a supplement to improve existing
rankers
![Page 12: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/12.jpg)
12
Contrast with Related Work• Biomedical text ranking
– Using synonyms and considering diversity of passages, without considering term proximity
• Text ranking– Individual text scoring techniques (e.g., BM25)
and learning to rank techniques (e.g., Ranking SVM), without considering term proximity
• Improving ranking by term proximity– Term proximity is employed, but contextual
completeness was not considered
![Page 13: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/13.jpg)
System Overview
13
Text Ranker Development
TrainingTesting
Underlying RankerPRE
Text Ranking TF in d
User
Query (q)
Text (d)
TF (Term Frequency) Assessment
Training Data
Ranked Texts
![Page 14: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/14.jpg)
TF Assessment
14
• Three types of term proximity– Overall proximity (QTermTF)– Individual proximity (IndiP)– Collective proximity (CollP)
• A term t may get a large TF increment in d, if – Many query terms appear frequently in d– Query terms are individually near to t at some
places, and– Query terms collectively appear at a place near to t
![Page 15: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/15.jpg)
15
•RTF(t,d,q) = TF(t,d)+TFincrement(t,d,q)•TFincrement(t,d,q) = QtermTF(d,q)IndiP(t,d,q)×CollP(t,d,q)•QtermTF(d,q) = Total TF of query terms in d•IndiP(t,d,q) =ΣmM -
{t}SigmoidWeight(Mindist(t,m))/ MaxIndiP•Mindist(x,y) = shortest distance between x and y in d•SigmoidWeight(dt) = 1/(1+e-((|q|-1)-dt))•CollP(t,d,q) = MaxkK{mM - {t}
SigmoidWeight(dist(t,k,m))}/MaxCollP, where K is the set positions at which t appears in d•dist(t,k,m) = Distance between t (at position k) and m
![Page 16: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/16.jpg)
16
![Page 17: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/17.jpg)
Empirical Evaluation
17
![Page 18: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/18.jpg)
Experimental Data• OHSUMED
– A popular database of biomedical queries and references
– 106 queries– 348,566 references– 16,140 query-reference pairs
• Definitively relevant• Possibly relevant• Not relevant
18
![Page 19: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/19.jpg)
• TREC Genomics 2006– 28 queries (topics) and 27,999 query-passage
pairs• Definitively relevant, possibly relevant, and not
relevant
– 13,993 query-reference pairs
• TREC Genomics 2007– 36 queries and 35,996 query-passage pairs
• Relevant and not relevant
– 22,913 query-reference pairs
19
![Page 20: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/20.jpg)
Underlying Rankers
20
![Page 21: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/21.jpg)
Baseline Ranker Enhancer• Three state-of-the-art techniques that enhanced
text rankers by term proximity– The t-function: t() [Tao & Zhai, 2007]
– The p-function: p() [Cummins & O’Riordan, 2009] – The proximity language model: PLM [Zhao & Yun,
2009]
21
![Page 22: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/22.jpg)
Evaluation Criteria• Evaluating how relevant references are ranked
higher for users to access– Mean average precision (MAP)
– Normalized discount cumulative gain at x (NDCG@X)
22
![Page 23: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/23.jpg)
Results
23
![Page 24: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/24.jpg)
24
![Page 25: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/25.jpg)
25
![Page 26: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/26.jpg)
26
![Page 27: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/27.jpg)
27
![Page 28: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/28.jpg)
28
![Page 29: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/29.jpg)
29
![Page 30: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/30.jpg)
30
![Page 31: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/31.jpg)
Conclusion
31
![Page 32: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/32.jpg)
• Contextual completeness of query concepts in the texts is essential in ranking biomedical texts
• To measure contextual completeness, it is helpful to integrate three types of term proximity– Overall proximity– Individual proximity– Collective proximity
• Existing rankers may be comprehensively enhanced
32
![Page 33: Enhancing Biomedical Text Rankers by Term Proximity Information 劉瑞瓏 慈濟大學醫學資訊學系 2012/06/13](https://reader035.vdocuments.site/reader035/viewer/2022081417/56649f0e5503460f94c2312a/html5/thumbnails/33.jpg)
33
Thank You!