temporal information retrieval - unibo.itmontesi/cbd/articoli/seminario_tir.pdf · § the temporal...
TRANSCRIPT
Temporal Information Retrieval
Stefano Giovanni Rizzo1 Danilo Montesi2, Matteo Brucato3
1,2 Department of Computer Science and Engineering - University of Bologna 3 School of Computer Science - University of Massachusetts Amherst
Information Retrieval § Information Retrieval (IR) is defined as the activity of
§ finding information resources (usually documents)
§ of an unstructured nature (usually text)
§ relevant to an information need (the user query)
§ from within a large collection (e.g. the web).
§ But… is the text truly unstructured? § As we’ll see structured information, like temporal
expressions, can be extracted from text data resources by means of NLP tools.
§ The temporal information in documents can be used to improve the IR effectiveness.
IR Effectiveness
§ To assess the effectiveness of an IR system (the quality of its search results), there are two parameters about the system’s returned results for a query: § Precision: What fraction of the returned results are
relevant to the information need?
§ Recall: What fraction of the relevant documents in the collection were returned by the system?
IR Evaluation
§ To measure ad hoc IR effectiveness in the standard way, we need a test collection consisting of three things: 1. A document collection
2. A set of information needs, in the form of user queries
3. A set of relevance judgments, in the form of binary assessment of either relevant or nonrelevant for each query–document pair.
Precision and Recall (1/3)
Document collection
Documents relevant to a
specific query
Retrieved Documents
Relevant Retrieved Documents
Precision and Recall (3/3)
Relevant Documents
Retrieved Documents
Relevant Retrieved Documents
= Precision
Relevant Retrieved Documents
= Recall
Precision @ K ▪ Set a rank threshold K ▪ Compute % relevant in top K ▪ Ignores documents ranked lower than K ▪ Ex: ▪ Precision@3 is 2/3 ▪ Precision@4 is 2/4 ▪ Precision@5 is 3/5 ▪ In similar fashion we have Recall@K
Temporal Relevance
§ The set of temporal properties of a document is one key aspect that determine a document’s relevance for a query. Despite this, temporal information is still treated as a simple textual keyword by traditional IR.
§ Temporal Information Retrieval aims at satisfying these temporal needs and at combining traditional notions of document relevance with the so-called temporal relevance.
Time and Temporal Scope
§ Time is an ubiquitous dimension of nearly every collection of documents § Digital libraries, news stories, tweets, the Web, …
§ Documents § Meta-level: creation, publication date, …
§ Content-level: Periods of time mentioned in the text
⇒ The document temporal scope
§ Queries § Meta-level: issue date (like in Google Trends)
§ Content-level: Periods of time mentioned in the query
⇒ The query temporal scope
Temporal Similarity vs Textual Similarity
§ Textual similarity § Similarity based on term statistics
§ Not adequate for temporal queries: § "results elections 2008"
§ ”results elections last year” § "2008" and "last year" are considered terms and searched
literally in the documents
§ We need to model Temporal Similarity
Temporal Intervals
§ Temporal intervals are semantically rich: § Synonymy:
§ "2014" = "last year" = "the year after 2013"
§ Polysemy: § "every friday", "yearly", "super bowl"
§ Algebraic structure (to correlate temporal scopes): § Overlapping
§ Containment
§ Distance
§ We can exploit these features to improve IR models
Temporal Domain
§ CHRONON § The smallest discrete unit of time (e.g., a second, a day, a
year)
§ TEMPORAL DOMAIN § Δ = [tmin, tmin], …, [1990, 1991], [1990, 1992], …, [tmin, tmax]
§ INTERPRETATION FUNCTION (NLP) § Ψ : TIMEX → ℘(Δ)
§ where TIMEX is the set of all possible time expressions
§ TEMPORAL SCOPE of a document D (or a query Q) § TD = { [1990, 1999], [1995, 1997], [2001, 2002] }
§ TQ = { [1991, 2001], [2002, 2003] }
Generalized metrics
§ Given an interval from the query and one from the document § [aQ,bQ] ∈ TQ
§ [aD,bD] ∈ TD
we define:
§ Manhattan distance
§ δsym([aQ,bQ],[aD,bD]) = |aQ-aD|+|bQ-bD|
§ Query-biased hemimetric
§ δcov(Q)([aQ,bQ],[aD,bD]) = (bQ-aQ) - (min{bQ, bD} - max{aQ, aD})
§ Document-biased hemimetric
§ δcov(D)([aQ,bQ],[aD,bD]) = (bD-aD) - (min{bQ, bD} - max{aQ, aD})
Manhattan Distance
§ δsym([aQ,bQ],[aD,bD]) = |aQ-aD|+|bQ-bD|
δ = 0 This looks intuitively correct
aQ bQ
bD aD
Manhattan Distance: Peculiarity
§ δsym([aQ,bQ],[aD,bD]) = |aQ-aD|+|bQ-bD|
The two documents would have the same distance 3+3=6 from the query…
aQ bQ
bD1
bD2
aD1
aD2
Document-biased Hemimetric
§ δcov(D)([aQ,bQ],[aD,bD]) = (bD-aD) - (min{bQ, bD} - max{aQ, aD})
δ1 = 0 δ2 = 6
• More appropriate for “broad” time queries: • Query represents the broadest time interval the user is
willing to accept • Distance reflects document coverage
aQ
bD1
bD2
aD1
aD2
bQ
Query-biased Hemimetric
§ δcov(Q)([aQ,bQ],[aD,bD]) = (bQ-aQ) - (min{bQ, bD} - max{aQ, aD})
δ1 = 6 δ2 = 0
• More appropriate for “narrow” time queries: • Query represents the narrowest time interval the user is
willing to accept • Distance reflects query coverage
aQ bQ
bD1
bD2 aD2
aD1
Generalized metrics - Summary
§ Metrics (e.g. Manhattan distance): § Non-negativity: δ(x,y) ≥ 0 § Coincidence: δ(x,y) = 0 iff x = y § Symmetry: δ(x,y) = δ(y,x)
§ Triangle inequality: δ(x,z) ≤ δ(x,y) + δ(y,z)
§ The 2 new distances are hemimetrics: § No symmetry § Partial coincidence:
§ δ(x,x) = 0 § but we allow y's, y ≠ x, such that:δ(x,y) = 0
§ Interesting property: § δsym(x,y) = δcov(D)(x,y) + δcov(Q)(x,y)
Combining text and time scores
§ Temporal similarity: § simδ*(Q, D) = exp {–δ*(TQ, TD)}
§ Two models of relevance
§ Textual similarity: simtxt
§ Temporal similarity: simtime
§ Combining them: § sim(Q, Di) = (1 – α) simtxt(Q, Di) + (α) simtime(Q, Di)
§ where α is a combination parameter in [0,1]
Evaluation – TREC Novelty § Queries: 50 (22% with time) § Results over all 50 queries, for different size k of
returned results:
Top-kBM25Baseline(TextSimilarity) Text+TemporalSimilarity
Precision Recall Precision Recall
5 0.84 0.17 0.87 0.18
10 0.82 0.34 0.85 0.35
20 0.79 0.65 0.81 0.67 α=0 α = 0.34
Evaluation: MAP over α § Mean average Precision (MAP) over different values of
the α combination paramater:
0,78
0,79
0,8
0,81
0,82
0,83
0,84
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
MA
P
α
Doc Coverage Query Coverage Euclidean Manhattan Manhattan Cov. Text baseline
Conclusions § Model for temporal scopes of documents and queries § Novel metrics for temporal scope similarity
§ Ranking model combining textual and temporal scores
§ Experimental evaluation of the effectiveness improvements over a text-only ranking
Future Works
§ Time quantification in text § Temporal metrics access methods § Additional metric distances
§ Score distribution normalization for textual-temporal combination
Time Quantification § Using a state-of-the-art temporal tagger (Heideltime) we
extracted temporal expressions within large sets of documents: § English Wikipedia (4.4 million articles)
§ New York Times (1.8 million articles)
§ Twitter (1.2 million tweets from 23/01/2011 to 08/02/2011)
§ AOL Search Queries (20 million search queries)
§ IR Collections (TREC GOV2 has 25 million documents)
§ Patterns and insights are arising from the temporal distribution analysis.
Wikipedia: Zipf’s Law
Rank Interval
1 2010-01-01 : 2010-12-31 (1 Year)
2 2008-01-01 : 2008-12-31 (1 Year)
3 2009-01-01 : 2009-12-31 (1 Year)
4 2013-01-01 : 2013-12-31 (1 Year)
5 2007-01-01 : 2007-12-31 (1 Year)
Thank you.
Questions?
Stefano Giovanni Rizzo1 Danilo Montesi2, Matteo Brucato3
1,2 Department of Computer Science and Engineering - University of Bologna 3 School of Computer Science - University of Massachusetts Amherst